Some thoughts about regex assertions, conditional structures and (non-)optional groups !
-
Hi All,
Recently, I’ve learned something about the zero-length. assertions, as
^or$. Indeed, as they are, simply, pre-defined look-arounds, referring to an empty string, you may surround them with parentheses, followed with the?quantifier, in order to create a NON-optional capturing group, possibly absent ! So :-
If the location, of the text to be matched, verifies the assertion, that group is defined and any conditional regex structure, using that group, will be in the
TRUEstate -
If the location, of the text to be matched, cannot verify the assertion, that group is NOT defined and any conditional regex structure, using that group, will be in the
FALSEstate
For instance, let’s consider the following regex S/R :
SEARCH
(^)?ABC($)?, where, both, the assertions^and$are stored as NON-optional groups1and2, possibly absent, due to the?quantifierREPLACE
?1(?2<123>:<123):(?{2}123>:123), which represents a conditional replacement structure, relative to groups1and2Given the sample data, below :
ABC ABC test test ABC test ABC testyou should get the text :
<123> <123 test test 123> test 123 testIt’s easy to notice that the replacement depends of the location of the ABC string :
-
If the ABC string is alone on a line, the two assertions are verified. So, the groups
1and2, both, exist =><ABC> -
If the ABC string begins a line, the
^assertion is verified. So, the group1exists, only =><ABC -
If the ABC string ends a line, the
$assertion is verified. So, the group2exists, only =>ABC> -
If any ABC string is embedded in a line, NO assertion can be verified. So, the two groups do not exist =>
ABC
Reminder :
The conditional structures, below :
-
(?(##)Regex_if_TRUE|Regex_if_FALSE)( in the Search regex ) -
(?{##}Regex_if_TRUE:Regex_if_FALSE)( in the Replacement regex )
refer, both, to the
##thcapturing group, of the search regex, which can be defined or notHowever, in order to be effective, these structures must concern an NON optional group, only !
Indeed, let’s consider the group
1, in the regex(?-i)ABC(\d*)ABC. The regex(\d*)is alwaysTRUEas it refers to, either :-
An existent NON-empty group
1( Some digits ) => Group1is defined and the condition isTRUE -
An existent empty group
1( An empty string ) => Group1is defined and the condition isTRUE
Now, if we re-build the regex this way
(?-i)ABC(\d+)?ABC, this time, the regex(\d+)?refers to a NON optional group1, possibly absent. So, the regex refers, either, to :-
The existent group
1( Some digits ) => Group1is defined and the condition isTRUE -
The NON-existent group
1( An empty string ) => Group1is NOT defined and the condition isFALSE
So, given the two sample lines :
ABCABC ABC12345ABCThe regex :
SEARCH
(?-i)ABC(\d*)ABCREPLACE
(?1True:False)gives the text :
True Truewhereas the equivalent regex :
SEARCH
(?-i)ABC(\d+)?ABCREPLACE
(?1True:False)Do give the text :
False TrueCheers,
guy038
-