Community
    • Login

    How to match all content between two XML tags except if a certain tag occurs between them?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 5.0k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Elias MossholmE Offline
      Elias Mossholm
      last edited by

      I have an XML file used to define test cases for a specific program. I’d like to find all test cases with at least 8 step changes (a step change is characterized by a specific set of tags (<step> etc.), see second test case in example below) and remove the last 3 step changes. There may be any tags between the first 5 step changes but the match must stay within a single test case.

      A simplified version of this is to be able to find “[start of test case] followed by [any text except “end of test case”] followed by [next step]”. Which is what I’m trying to solve below (unsuccessfully).

      Example text to search in:

      <unit-test name="3d.">
          <units>
              <multiset>
                  <set action="commit" parameter="variant_field" value="variant1"/>
                  <set action="commit" parameter="variant_field2" value="variant2"/>
              </multiset>
              <assert-param-value>
                  <parameter>type_field</parameter>
                  <value>type1</value>
                  <operation>=</operation>
              </assert-param-value>
              <commit>
                  <parameter>type_field2</parameter>
                  <value>myType</value>
                  <accept>false</accept>
              </commit>
              <assert-param-hidden>
                  <parameter>type_field5</parameter>
                  <hidden>true</hidden>
              </assert-param-hidden>
          </units>
      </unit-test>
      <unit-test name="3e.">
          <units>
              <assert-attribute-value>
                  <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute>
                  <value>type7</value>
                  <operation>=</operation>
              </assert-attribute-value>
              <step>
                  <stepName>next</stepName>
              </step>
              <commit>
                  <parameter>part_field</parameter>
                  <value>part1</value>
                  <accept>false</accept>
              </commit>
          </units>
      </unit-test>
      

      The following expression successfully matches “[start of test case] followed by [any text] followed by [next step]”, but since I left out <except “end of test case”>, it crosses the test case border if I search from the beginning of the above text.
      Note that I use “. matches newline” setting in the search dialog.

      (<unit-test.*?)( *<step>\r\n\ *<stepName>next</stepName>\r\n\ *</step>)
      

      I tried adding a negative lookahead (?!</units>) (strictly speaking </units> is the second-last tag, but it only occurs at the end of each test case so it should work just as well as </unit-test> would):

      (<unit-test.(?!</units>)*?)( *<step>\r\n\ *<stepName>next</stepName>\r\n\ *</step>)
      

      …but that is an invalid expression according to np++.

      This being the first time I use negative lookaheads (or any assertions), I tried rearranging the above expression so that the negative assertion comes after the full .*? instead of between the . and the *?:

      (<unit-test.*?(?!</units>))( *<step>\r\n\ *<stepName>next</stepName>\r\n\ *</step>)
      

      …but that matches across the test case border, so it fails to solve the problem.

      If anyone could shed some light on how I could solve this, I’d be very happy.

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA Offline
        Alan Kilborn @Elias Mossholm
        last edited by

        @Elias-Mossholm said in How to match all content between two XML tags except if a certain tag occurs between them?:

        I’d like to find all test cases with at least 8 step changes (a step change is characterized by a specific set of tags (<step> etc.), see second test case in example below) and remove the last 3 step changes.

        Maybe I’m just not understanding…
        You say “at least 8 step changes” but then you show sample data that has only one step (defined by <step>...</step>)?
        I suppose it can be solved from the description, but it is odd that your sample data would not be affected by the solution.

        1 Reply Last reply Reply Quote 2
        • Elias MossholmE Offline
          Elias Mossholm
          last edited by

          You say “at least 8 step changes” but then you show sample data that has only one step (defined by <step>…</step>)?

          Maybe that was a bit unclear.

          The number of step changes I need to have in the match should be easy to specify with a {n,n} tag or by just repeating the same segment n times. I’ve already built regex strings that match that correctly.

          The problem I haven’t been able to solve, is how to only find matches that don’t include the end tag </units>.

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by guy038

            Hello, @elias-mossholm, @alan-kilborn and All,

            Here is a regex which selects all contents of any <unit-test name="xxxx">.......</unit-test> block, which contains ONLY  between N and M block(s) <step>............</step> , like below :

                    <step>
                        <stepName>XXXX</stepName>
                    </step>
            

            or

                    <step><stepName>YYYY</stepName></step>
            

            SEARCH (?s)^\h*<unit-test(?:(((?!</?unit-test|</?step>).)+?)<step>(?1)</step>){N,M}(?1)</unit-test>\R

            Of course, you must replace the N and M variables with the appropriate integers : {2,4}, {1,}, {0,3}, {2} or even {0} !

            You may try this regex against the sample text , below :

            <unit-test name="3d.">
            <!--                        1 BLOCK -->
                <units>
                    <multiset>
                        <set action="commit" parameter="variant_field" value="variant1"/>
                        <set action="commit" parameter="variant_field2" value="variant2"/>
                    </multiset>
                    <assert-param-value>
                        <parameter>type_field</parameter>
                        <value>type1</value>
                        <operation>=</operation>
                    </assert-param-value>
                    <commit>
                        <parameter>type_field2</parameter>
                        <value>myType</value>
                        <accept>false</accept>
                    </commit>
                    <step><stepName>AAAA</stepName></step>
                    <assert-param-hidden>
                        <parameter>type_field5</parameter>
                        <hidden>true</hidden>
                    </assert-param-hidden>
                </units>
            </unit-test>
            <unit-test name="3e.">
            <!--                        4 BLOCKS -->
                <units>
                    <assert-attribute-value>
                        <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute>
                        <value>type7</value>
                        <operation>=</operation>
                    </assert-attribute-value>
                    <step>
                        <stepName>BBBB</stepName>
                    </step>
                    <commit>
                        <parameter>part_field</parameter>
                        <value>part1</value>
                        <accept>false</accept>
                    </commit>
                    <assert-attribute-value>
                        <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute>
                        <value>type7</value>
                        <operation>=</operation>
                    </assert-attribute-value>
                    <step>
                        <stepName>CCCC</stepName>
                    </step>
                    <commit>
                        <parameter>part_field</parameter>
                        <value>part1</value>
                        <accept>false</accept>
                    </commit>
                    <assert-attribute-value>
                        <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute>
                        <value>type7</value>
                        <operation>=</operation>
                    </assert-attribute-value>
                    <step>
                        <stepName>DDDD</stepName>
                    </step>
                    <commit>
                        <parameter>part_field</parameter>
                        <value>part1</value>
                        <accept>false</accept>
                    </commit>
                    <assert-attribute-value>
                        <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute>
                        <value>type7</value>
                        <operation>=</operation>
                    </assert-attribute-value>
                    <step>
                        <stepName>EEEE</stepName>
                    </step>
                    <commit>
                        <parameter>part_field</parameter>
                        <value>part1</value>
                        <accept>false</accept>
                    </commit>
                </units>
            </unit-test>
            <unit-test name="3f.">
            <!--                        0 BLOCK -->
                <units>
                    <multiset>
                        <set action="commit" parameter="variant_field" value="variant1"/>
                        <set action="commit" parameter="variant_field2" value="variant2"/>
                    </multiset>
                    <commit>
                        <parameter>type_field2</parameter>
                        <value>myType</value>
                        <accept>false</accept>
                    </commit>
                    <assert-param-hidden>
                        <parameter>type_field5</parameter>
                        <hidden>true</hidden>
                    </assert-param-hidden>
                </units>
            </unit-test>
            <unit-test name="3g.">
            <!--                        2 consecutive BLOCKS -->
                <units>
                    <multiset>
                        <set action="commit" parameter="variant_field" value="variant1"/>
                        <set action="commit" parameter="variant_field2" value="variant2"/>
                    </multiset>
                    <step>
                        <stepName>FFFF</stepName>
                    </step>
                    <step>
                        <stepName>GGGG</stepName>
                    </step>
                    <assert-param-value>
                        <parameter>type_field</parameter>
                        <value>type1</value>
                        <operation>=</operation>
                    </assert-param-value>
                    <commit>
                        <parameter>type_field2</parameter>
                        <value>myType</value>
                        <accept>false</accept>
                    </commit>
                    <assert-param-hidden>
                        <parameter>type_field5</parameter>
                        <hidden>true</hidden>
                    </assert-param-hidden>
                </units>
            </unit-test>
            

            • Before each line <units>, I inserted an XML comment, where I noted the number of <step>......</step> blocks of each <unit-test name="xxxx">......</unit-test> block of my example

            • This regex, quite complex, can be decomposed, using the free-spacing mode (?x), as :

            (?xs)                             # FREE-SPACING and SINGLE-LINE modes
            ^\h*<unit-test                    # String "<unit-test", preceded with some HORIZONTAL BLANK character(s)
            (?:                               # Beginning of a NON-CAPTURING group
            (((?!</?unit-test|</?step>).)+?)  # SHORTEST NON-0 range of any char, NOT crossing "</?unit-test" nor "</?step>", and stored as GROUP 1
            <step>                            # ...till the STRING "<step>"
            (?1)                              # CALL of the regex SUB-ROUTINE, stored in GROUP 1, so the regex :  ((?!</?unit-test|</?step>).)+?
            </step>                           # ...till the STRING "</step>"
            ){N,M}                            # DESIRED number of "<step>...</step>" ranges, between N and M, in a SINGLE "<unit-test...</unit-test>" block
            (?1)                              # CALL of the regex SUB-ROUTINE, stored in GROUP 1, so the regex :  ((?!</?unit-test|</?step>).)+?
            </unit-test>\R                    # STRING "</unit-test>" with its LINE-BREAK
            

            Remark : just note that, in order to shorten the overall regex, the part ((?!</?unit-test|</?step>).)+?, stored as group1, and which represents the shortest non-null range of any char, not crossing the </?unit-test string nor the </?step> string, is re-used two times, thanks to the sub-routine call syntax (?1) !

            Best Regards,

            guy038

            DesAWSumeD 1 Reply Last reply Reply Quote 2
            • Elias MossholmE Offline
              Elias Mossholm
              last edited by

              Thank you @guy038!

              1 Reply Last reply Reply Quote 1
              • DesAWSumeD Offline
                DesAWSume @guy038
                last edited by DesAWSume

                Hi @guy038

                Is there a way to find below pattern with Regex?

                Log file

                <Text>
                    ns:="https://www.example.com"
                    <Error>
                        <id>ex8359693589435834583985934583495</id>
                        <ErrorItem>
                            <id>slak;jdk;asjdklasjdklasjdfhkldj;sfjdsf</id>
                            <code>404</code>
                            <description>External>  failed messages multiple line of detials </description>
                            <reference>/</reference>
                        </ErrorItem>
                    </Error>
                    <InformationLog>
                        <cccpInformation>
                            <description>External>  failed messages multiple line of detials 2 </description>
                            <Place>
                                <id>988475748848758478545</id>
                            </Place>
                        </cccpInformation>
                    </InformationLog>
                </Text>
                

                Basically, I only want to capture the <description> tag only in
                <ErrorItem></ErrorItem>

                and nothing else. and the logs also contain description tag on other level of the tag

                I can achieve some basic matching using something like

                (?s)<ErrorItem>(.*?)<\/description>
                

                but it will select everything inside <ErrorItem> </ErrorItem>

                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA Offline
                  Alan Kilborn @DesAWSume
                  last edited by

                  @DesAWSume said in How to match all content between two XML tags except if a certain tag occurs between them?:

                  I only want to capture the <description> tag only in
                  <ErrorItem></ErrorItem>

                  See HERE.

                  1 Reply Last reply Reply Quote 0

                  Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                  Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                  With your input, this post could be even better 💗

                  Register Login
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors