How to match all content between two XML tags except if a certain tag occurs between them?
-
I have an XML file used to define test cases for a specific program. I’d like to find all test cases with at least 8 step changes (a step change is characterized by a specific set of tags (
<step>etc.), see second test case in example below) and remove the last 3 step changes. There may be any tags between the first 5 step changes but the match must stay within a single test case.A simplified version of this is to be able to find “[start of test case] followed by [any text except “end of test case”] followed by [next step]”. Which is what I’m trying to solve below (unsuccessfully).
Example text to search in:
<unit-test name="3d."> <units> <multiset> <set action="commit" parameter="variant_field" value="variant1"/> <set action="commit" parameter="variant_field2" value="variant2"/> </multiset> <assert-param-value> <parameter>type_field</parameter> <value>type1</value> <operation>=</operation> </assert-param-value> <commit> <parameter>type_field2</parameter> <value>myType</value> <accept>false</accept> </commit> <assert-param-hidden> <parameter>type_field5</parameter> <hidden>true</hidden> </assert-param-hidden> </units> </unit-test> <unit-test name="3e."> <units> <assert-attribute-value> <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute> <value>type7</value> <operation>=</operation> </assert-attribute-value> <step> <stepName>next</stepName> </step> <commit> <parameter>part_field</parameter> <value>part1</value> <accept>false</accept> </commit> </units> </unit-test>The following expression successfully matches “[start of test case] followed by [any text] followed by [next step]”, but since I left out <except “end of test case”>, it crosses the test case border if I search from the beginning of the above text.
Note that I use “. matches newline” setting in the search dialog.(<unit-test.*?)( *<step>\r\n\ *<stepName>next</stepName>\r\n\ *</step>)I tried adding a negative lookahead
(?!</units>)(strictly speaking</units>is the second-last tag, but it only occurs at the end of each test case so it should work just as well as</unit-test>would):(<unit-test.(?!</units>)*?)( *<step>\r\n\ *<stepName>next</stepName>\r\n\ *</step>)…but that is an invalid expression according to np++.
This being the first time I use negative lookaheads (or any assertions), I tried rearranging the above expression so that the negative assertion comes after the full
.*?instead of between the.and the*?:(<unit-test.*?(?!</units>))( *<step>\r\n\ *<stepName>next</stepName>\r\n\ *</step>)…but that matches across the test case border, so it fails to solve the problem.
If anyone could shed some light on how I could solve this, I’d be very happy.
-
@Elias-Mossholm said in How to match all content between two XML tags except if a certain tag occurs between them?:
I’d like to find all test cases with at least 8 step changes (a step change is characterized by a specific set of tags (<step> etc.), see second test case in example below) and remove the last 3 step changes.
Maybe I’m just not understanding…
You say “at least 8 step changes” but then you show sample data that has only one step (defined by<step>...</step>)?
I suppose it can be solved from the description, but it is odd that your sample data would not be affected by the solution. -
You say “at least 8 step changes” but then you show sample data that has only one step (defined by <step>…</step>)?
Maybe that was a bit unclear.
The number of step changes I need to have in the match should be easy to specify with a
{n,n}tag or by just repeating the same segment n times. I’ve already built regex strings that match that correctly.The problem I haven’t been able to solve, is how to only find matches that don’t include the end tag
</units>. -
Hello, @elias-mossholm, @alan-kilborn and All,
Here is a regex which selects all contents of any
<unit-test name="xxxx">.......</unit-test>block, which contains ONLY betweenNandMblock(s)<step>............</step>, like below :<step> <stepName>XXXX</stepName> </step>or
<step><stepName>YYYY</stepName></step>SEARCH
(?s)^\h*<unit-test(?:(((?!</?unit-test|</?step>).)+?)<step>(?1)</step>){N,M}(?1)</unit-test>\ROf course, you must replace the N and M variables with the appropriate integers :
{2,4},{1,},{0,3},{2}or even{0}!You may try this regex against the sample text , below :
<unit-test name="3d."> <!-- 1 BLOCK --> <units> <multiset> <set action="commit" parameter="variant_field" value="variant1"/> <set action="commit" parameter="variant_field2" value="variant2"/> </multiset> <assert-param-value> <parameter>type_field</parameter> <value>type1</value> <operation>=</operation> </assert-param-value> <commit> <parameter>type_field2</parameter> <value>myType</value> <accept>false</accept> </commit> <step><stepName>AAAA</stepName></step> <assert-param-hidden> <parameter>type_field5</parameter> <hidden>true</hidden> </assert-param-hidden> </units> </unit-test> <unit-test name="3e."> <!-- 4 BLOCKS --> <units> <assert-attribute-value> <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute> <value>type7</value> <operation>=</operation> </assert-attribute-value> <step> <stepName>BBBB</stepName> </step> <commit> <parameter>part_field</parameter> <value>part1</value> <accept>false</accept> </commit> <assert-attribute-value> <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute> <value>type7</value> <operation>=</operation> </assert-attribute-value> <step> <stepName>CCCC</stepName> </step> <commit> <parameter>part_field</parameter> <value>part1</value> <accept>false</accept> </commit> <assert-attribute-value> <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute> <value>type7</value> <operation>=</operation> </assert-attribute-value> <step> <stepName>DDDD</stepName> </step> <commit> <parameter>part_field</parameter> <value>part1</value> <accept>false</accept> </commit> <assert-attribute-value> <attribute>part.subpart.subsubpart.block[3].anotherPart.type</attribute> <value>type7</value> <operation>=</operation> </assert-attribute-value> <step> <stepName>EEEE</stepName> </step> <commit> <parameter>part_field</parameter> <value>part1</value> <accept>false</accept> </commit> </units> </unit-test> <unit-test name="3f."> <!-- 0 BLOCK --> <units> <multiset> <set action="commit" parameter="variant_field" value="variant1"/> <set action="commit" parameter="variant_field2" value="variant2"/> </multiset> <commit> <parameter>type_field2</parameter> <value>myType</value> <accept>false</accept> </commit> <assert-param-hidden> <parameter>type_field5</parameter> <hidden>true</hidden> </assert-param-hidden> </units> </unit-test> <unit-test name="3g."> <!-- 2 consecutive BLOCKS --> <units> <multiset> <set action="commit" parameter="variant_field" value="variant1"/> <set action="commit" parameter="variant_field2" value="variant2"/> </multiset> <step> <stepName>FFFF</stepName> </step> <step> <stepName>GGGG</stepName> </step> <assert-param-value> <parameter>type_field</parameter> <value>type1</value> <operation>=</operation> </assert-param-value> <commit> <parameter>type_field2</parameter> <value>myType</value> <accept>false</accept> </commit> <assert-param-hidden> <parameter>type_field5</parameter> <hidden>true</hidden> </assert-param-hidden> </units> </unit-test>
-
Before each line
<units>, I inserted anXMLcomment, where I noted the number of<step>......</step>blocks of each<unit-test name="xxxx">......</unit-test>block of my example -
This regex, quite complex, can be decomposed, using the free-spacing mode
(?x), as :
(?xs) # FREE-SPACING and SINGLE-LINE modes ^\h*<unit-test # String "<unit-test", preceded with some HORIZONTAL BLANK character(s) (?: # Beginning of a NON-CAPTURING group (((?!</?unit-test|</?step>).)+?) # SHORTEST NON-0 range of any char, NOT crossing "</?unit-test" nor "</?step>", and stored as GROUP 1 <step> # ...till the STRING "<step>" (?1) # CALL of the regex SUB-ROUTINE, stored in GROUP 1, so the regex : ((?!</?unit-test|</?step>).)+? </step> # ...till the STRING "</step>" ){N,M} # DESIRED number of "<step>...</step>" ranges, between N and M, in a SINGLE "<unit-test...</unit-test>" block (?1) # CALL of the regex SUB-ROUTINE, stored in GROUP 1, so the regex : ((?!</?unit-test|</?step>).)+? </unit-test>\R # STRING "</unit-test>" with its LINE-BREAKRemark : just note that, in order to shorten the overall regex, the part
((?!</?unit-test|</?step>).)+?, stored as group1, and which represents the shortest non-null range of any char, not crossing the</?unit-teststring nor the</?step>string, is re-used two times, thanks to the sub-routine call syntax(?1)!Best Regards,
guy038
-
-
Thank you @guy038!
-
Hi @guy038
Is there a way to find below pattern with Regex?
Log file
<Text> ns:="https://www.example.com" <Error> <id>ex8359693589435834583985934583495</id> <ErrorItem> <id>slak;jdk;asjdklasjdklasjdfhkldj;sfjdsf</id> <code>404</code> <description>External> failed messages multiple line of detials </description> <reference>/</reference> </ErrorItem> </Error> <InformationLog> <cccpInformation> <description>External> failed messages multiple line of detials 2 </description> <Place> <id>988475748848758478545</id> </Place> </cccpInformation> </InformationLog> </Text>Basically, I only want to capture the <description> tag only in
<ErrorItem></ErrorItem>and nothing else. and the logs also contain description tag on other level of the tag
I can achieve some basic matching using something like
(?s)<ErrorItem>(.*?)<\/description>but it will select everything inside <ErrorItem> </ErrorItem>
-
@DesAWSume said in How to match all content between two XML tags except if a certain tag occurs between them?:
I only want to capture the <description> tag only in
<ErrorItem></ErrorItem>See HERE.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login