Hi, @Lemmy-westin, and All,
Thinking again about your problem, I succeeded to build a general method and the corresponding regexes !
So, let’s suppose you have a text, separated in TWO parts, by a single line, build of some # characters.
Then, you may like to search for :
Case D1 : Lines, which lie, ONLY, in the FIRST part of the text ( BEFORE the ###### line )
Case E1 : Lines, which lie, BOTH, in the TWO parts of the text ( BEFORE and AFTER the ###### line )
Case D2 : Parts of line, which lie, ONLY, in the FIRST part of the text ( BEFORE the ###### line )
Case E2 : Parts of line, which lie, BOTH, in the TWO parts of the text ( BEFORE and AFTER the ###### line )
Case D3 : Single words, which lie, ONLY, in the FIRST part of the text ( BEFORE the ###### line )
Case E3 : Single words, which lie, BOTH, in the TWO parts of the text ( BEFORE and AFTER the ###### line )
Remark :
If you want to search for ranges, in the SECOND part of text, exclusively, just swap the two parts of text and use, either, the case D1, D2 or D3 !
To, correctly, define these three ranges of text, we’ll use a start boundary and an end boundary. They will be used, in the look-behind and look-ahead structures, and will NEVER be part of the regex to search for !
For cases D1 and E1 :
Start boundary = ^ ( Beginning of line ) OR \R ( End of Line characters of previous line )
End boundary = \R ( End of line character(s) = \r\n in Windows files or \n in Unix files )
Searched regex .+ ( All standard characters of any NO-blank line )
For cases D2 and E2 :
Start boundary = % ( An other dummy character, NOT already used in current text )
End boundary = % ( The same character, as above )
Searched regex = .+ ( Any NON-null range of standard characters, between the two % excluded limits )
For cases D3 and E3 :
Start boundary = \W ( A NON-word character, so, any character different from [0-9A-Za-z] and from all accentuated characters. This, also, includes the End of Line characters )
End boundary = \W ( A NON-word character, as above )
Searched regex = (\w+) ( A complete single word, of any length, between two excluded NON-word characters )
Now, here are the regexes to achieve these different searches :
Case D1 : (?i)^(.+)(?s)(?=\R.*#+(?!.*\R\1(\R|\z))) OR (?i)^(.+)(?s)(?=\R.*#+)(?!.*#+.*\R\1(\R|\z))
Case E1 : (?i)^(.+)(?s)(?=\R.*#+(?=.*\R\1(\R|\z))) OR (?i)^(.+)(?s)(?=\R.*#+.*\R\1(\R|\z))
You may test the D1 and E1 regexes with, for instance, the text, below, in a NEW tab :
When we speak of free software, we are referring to freedom, not price. Our General When we speak of free software, we are referring to make sure that you have the freedom to distribute copies This is a simple test ######################################### This IS A simple TEST When we SPEAK of free freedom, not price. Our General make sure that you have the freedom, not price. Our GeneralCase D2 : (?i)(?<=%)(.+)(?s)(?=%.*#+(?!.*%\1%)) OR (?i)(?<=%)(.+)(?s)(?=%.*#+)(?!%.*#+.*%\1%)
Case E2 : (?i)(?<=%)(.+)(?s)(?=%.*#+(?=.*%\1%)) OR (?i)(?<=%)(.+)(?s)(?=%.*#+.*%\1%)
You may test the D2 and E2 regexes with, for instance, the text, below, in a NEW tab :
111 %When we speak of free% 111 222,%software, we are referring to%,222 333 % freedom, not price. Our General% 333 abc %When we speak of free% abc xyz,%software, we are referring to%,xyz %make sure that you have the% 555 %freedom to distribute copies% 555 666:%This is a simple test%:666 ##################################################################### 777|||%This is A simple TEST%|||777 888----%When we SPEAK of free%----888 999% freedom, not price. Our General%999 abc %make sure that you have the% abc 000000000% freedom, not price. Our General%0000000000000000 ------------- %make sure that you have the% ------------Case D3 : (?si)(?<=\W)(\w+)(?=\W.*#+(?!.*\W\1(\W|\z))) OR (?si)(?<=\W)(\w+)(?=\W.*#+)(?!.*#+.*\W\1(\W|\z))
Case E3 : (?si)(?<=\W)(\w+)(?=\W.*#+(?=.*\W\1(\W|\z))) OR (?si)(?<=\W)(\w+)(?=\W.*#+.*\W\1(\W|\z))
You may test the D3 and E3 regexes with, for instance, the text, below, in a NEW tab :
software price freedom SOFtware prICE General Public This is a simple test to find out identical / different words inside that text ########################################################################################## This, is A test in order to know the same / different words of the text SoftwarE freeDOM genERal FREEDOMNotes :
The last cases D3 and E3 are the ones, discussed in my previous topic
All the regexes , above, are case insensitive. If searches must be sensitive, just change the (?i) syntaxes into (?-i) and the (?si) syntaxes into (?s-i)
Remember that your text must contain just ONE line with , at least, one # character
Regarding the D1, D2 and D3 equivalent regexes, their general template are :
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead[Negative Look-Ahead]], with nested look-aheads
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead][Negative Look-Ahead], with juxtaposed look-aheads
Regarding the E1, E2 and E3 equivalent regexes, their general template are :
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead[Positive Look-Ahead]], with nested look-aheads
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead], with 1 look-ahead, only
Just notice that a positive look-ahead, nested in an other positive look-ahead, may be merged in an unique look-ahead. But it’s impossible to merge a negative look-ahead, nested in a positive look-ahead !
Of course, as usual, you may replace, delete, mark or bookmark the different matches, for further modifications !
Cheers,
guy038