Community
    • Login

    Some thoughts about regex assertions, conditional structures and (non-)optional groups !

    Scheduled Pinned Locked Moved General Discussion
    1 Posts 1 Posters 916 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hi All,

      Recently, I’ve learned something about the zero-length. assertions, as ^ or $. Indeed, as they are, simply, pre-defined look-arounds, referring to an empty string, you may surround them with parentheses, followed with the ? quantifier, in order to create a NON-optional capturing group, possibly absent ! So :

      • If the location, of the text to be matched, verifies the assertion, that group is defined and any conditional regex structure, using that group, will be in the TRUE state

      • If the location, of the text to be matched, cannot verify the assertion, that group is NOT defined and any conditional regex structure, using that group, will be in the FALSE state


      For instance, let’s consider the following regex S/R :

      SEARCH (^)?ABC($)? , where, both, the assertions ^ and $ are stored as NON-optional groups 1 and 2, possibly absent, due to the ? quantifier

      REPLACE ?1(?2<123>:<123):(?{2}123>:123) , which represents a conditional replacement structure, relative to groups 1 and 2

      Given the sample data, below :

      ABC
      ABC test
      test ABC
      test ABC test
      

      you should get the text :

      <123>
      <123 test
      test 123>
      test 123 test
      

      It’s easy to notice that the replacement depends of the location of the ABC string :

      • If the ABC string is alone on a line, the two assertions are verified. So, the groups 1 and 2, both, exist => <ABC>

      • If the ABC string begins a line, the ^ assertion is verified. So, the group 1 exists, only => <ABC

      • If the ABC string ends a line, the $ assertion is verified. So, the group 2 exists, only => ABC>

      • If any ABC string is embedded in a line, NO assertion can be verified. So, the two groups do not exist => ABC


      Reminder :

      The conditional structures, below :

      • (?(##)Regex_if_TRUE|Regex_if_FALSE) ( in the Search regex )

      • (?{##}Regex_if_TRUE:Regex_if_FALSE) ( in the Replacement regex )

      refer, both, to the ##th capturing group, of the search regex, which can be defined or not

      However, in order to be effective, these structures must concern an NON optional group, only !


      Indeed, let’s consider the group 1, in the regex (?-i)ABC(\d*)ABC. The regex (\d*) is always TRUE as it refers to, either :

      • An existent NON-empty group 1 ( Some digits ) => Group 1 is defined and the condition is TRUE

      • An existent empty group 1 ( An empty string ) => Group 1 is defined and the condition is TRUE

      Now, if we re-build the regex this way (?-i)ABC(\d+)?ABC, this time, the regex (\d+)? refers to a NON optional group 1, possibly absent. So, the regex refers, either, to :

      • The existent group 1 ( Some digits ) => Group 1 is defined and the condition is TRUE

      • The NON-existent group 1 ( An empty string ) => Group 1 is NOT defined and the condition is FALSE


      So, given the two sample lines :

      ABCABC
      ABC12345ABC
      

      The regex :

      SEARCH (?-i)ABC(\d*)ABC

      REPLACE (?1True:False)

      gives the text :

      True
      True
      

      whereas the equivalent regex :

      SEARCH (?-i)ABC(\d+)?ABC

      REPLACE (?1True:False)

      Do give the text :

      False
      True
      

      Cheers,

      guy038

      1 Reply Last reply Reply Quote 3
      • First post
        Last post
      The Community of users of the Notepad++ text editor.
      Powered by NodeBB | Contributors