How to Print Pretty with missing close tags.
-
I am looking at a Quicken QFX log file that is in a sort of XML type format. The format has many missing End tags so this causes the XML Tools - Pretty Print to indent nearly forever.
Is there a way to align the Start and End tags that are present?
For example in the following code how do I align the bolded lines:
<OFX> <SIGNONMSGSRQV1> **<SONRQ>** <DTCLIENT>20250520104016.123[-7:MST] <USERID>anonymous00000000000000000000000 <USERPASS>X <GENUSERKEY>N <LANGUAGE>ENG <APPID>QWIN <APPVER>2700 **</SONRQ>** </SIGNONMSGSRQV1> <INTU.BRANDMSGSRQV1> <INTU.BRANDTRNRQ> <TRNUID>19FFC8F0-7EF9-1000-BC8D-909811990026 <INTU.BRANDRQ>I am running on Win 11, latest update and Np++ v8.8.3
-
@Doctor-Rashir said in How to Print Pretty with missing close tags.:
I am looking at a Quicken QFX log file that is in a sort of XML type format. The format has many missing End tags so this causes the XML Tools - Pretty Print to indent nearly forever.
Is there a way to align the Start and End tags that are present?
XML Tools is designed to work with well-formed XML. If it’s not well-formed (ie, unclosed tags), it’s just too much of an edge case. It’s doubtful there’s any toolmaker out there who could figure out a way to “pretty print” a seemingly-random mixture of closed and unclosed tags in any meaningful way.
If you were to unindent everything (
Ctrl+A, thenShift+TABuntil it’s gone, or search for^\h+and replace with nothing), then if you knew in advance which tags (like<SONRQ>) had closing pairs, you could use the zone-of-text regex forumula from our FAQ, as:- FIND =
(?-si:<SONRQ\b|(?!\A)\G)(?s-i:(?!</SONRQ\b).)*?\K(?-si:^(?!\h*</SONRQ))
REPLACE =\t
REPLACE ALL
If I do three steps: unindent, formula(
SONRQ) and formula(SIGNONMSGSRQV1), then with your example data, I get<OFX> <SIGNONMSGSRQV1> <SONRQ> <DTCLIENT>20250520104016.123[-7:MST] <USERID>anonymous00000000000000000000000 <USERPASS>X <GENUSERKEY>N <LANGUAGE>ENG <APPID>QWIN <APPVER>2700 </SONRQ> </SIGNONMSGSRQV1> <INTU.BRANDMSGSRQV1> <INTU.BRANDTRNRQ> <TRNUID>19FFC8F0-7EF9-1000-BC8D-909811990026 <INTU.BRANDRQ>I don’t know how many other closed tags there are in your file, so I don’t know whether that’s practical for you or not. But it’s the best I can come up with for now, without invoking a full-on programming language (at which point, it could be done in the contents of the Notepad++ window using a plugin like PythonScript, or it could just be done at the command-line with whatever programming language you wanted to use, without needing the file to be open in Notepad++, and thus make it off-topic here)
I did try to make use of a numbered or named capture group in the BSR section and use a backreference to make the BSR and FR invoke those (see the FAQ for the meaning of BSR / ESR / FR), rather than having to know in advance the names of all the tags… but I couldn’t get those backreference versions to work.
- FIND =
-
@PeterJones
I really appreciate what you’ve posted. There are many closed tags. And many open tags.
I’m just trying to analyze the error I’m encountering with Quicken. I’ll look at what you propose but I have to determine how much work it is to fix or just the ones important to my analysis of the log.Thanks again.
-
If you are willing to use the PythonScript plugin (instructions found in our FAQ, here; I only tested with PythonScript 3, but I tried to write it so I think it’s compatible with the PythonScript 2 in the Plugins Admin; I recommend PythonScript 3)
Script:
PrettyPrintBadXML.py# encoding=utf-8 """in response to https://community.notepad-plus-plus.org/topic/27254/ This will take malformed XML (many/most tags with no closing tag) and pretty-print it so that each layer of closed tags indents its contents """ from Npp import editor import re editor.beginUndoAction() sEOL = ('\r\n', '\r', '\n')[editor.getEOLMode()] # First, one tag per line, no indentation editor.rereplace(r'\s*<', sEOL + r'<', re.MULTILINE) # get rid of extra newlines at beginning and end (but final line will end with EOL, so N++ shows empty last line) editor.rereplace(r'\A\s+', '', re.MULTILINE) editor.rereplace(r'\v+\z', sEOL, re.MULTILINE) # figure out all the closing tags `</CLOSING>` closers = {} def trackClosingTags(m): global closers closers[m.group(1)] = True editor.research(r'</(\w+)\s*>', trackClosingTags) for tag in closers.keys(): f = r'(?-si:<{0}\b|(?!\A)\G)(?s-i:(?!</{0}\b).)*?\K(?-si:^(?!\h*</{0}))'.format(tag) editor.rereplace(f, '\t', re.MULTILINE) editor.endUndoAction()INPUT FILE:
<OFX> <SIGNONMSGSRQV1> <SONRQ> <DTCLIENT>20250520104016.123[-7:MST] <USERID>anonymous00000000000000000000000 <USERPASS>X <GENUSERKEY>N <LANGUAGE>ENG <APPID>QWIN <APPVER>2700 </SONRQ> </SIGNONMSGSRQV1> <INTU.BRANDMSGSRQV1> <INTU.BRANDTRNRQ> <TRNUID>19FFC8F0-7EF9-1000-BC8D-909811990026 <INTU.BRANDRQ> <FAKE> <OTHER> <TAG> <FAKE> <OTHER> <EMBEDDED> <FAKE> <DEEPER> <OTHER> </DEEPER> <OTHER> </EMBEDDED> </TAG>OUTPUT:
<OFX> <SIGNONMSGSRQV1> <SONRQ> <DTCLIENT>20250520104016.123[-7:MST] <USERID>anonymous00000000000000000000000 <USERPASS>X <GENUSERKEY>N <LANGUAGE>ENG <APPID>QWIN <APPVER>2700 </SONRQ> </SIGNONMSGSRQV1> <INTU.BRANDMSGSRQV1> <INTU.BRANDTRNRQ> <TRNUID>19FFC8F0-7EF9-1000-BC8D-909811990026 <INTU.BRANDRQ> <FAKE> <OTHER> <TAG> <FAKE> <OTHER> <EMBEDDED> <FAKE> <DEEPER> <OTHER> </DEEPER> <OTHER> </EMBEDDED> </TAG>Essentially, what the script does:
- Puts each
<XYZ>or</CCCC>starting on its own line, with no indentation - Figures out all the
</CCCC>closing tags (so it knows all the tags which will need to be indented) - For each of those
CCCCtags, do the indentation replacement I suggested in the last post
Since the indentation it does is cumulative, it will properly nest (as shown with myTAG...EMBEDDED...DEEPERhierarchy, for example)
The script is designed so that after you run the script, if you do
Ctrl+Zto UNDO, it will go back to the state before you ran the script.If you would prefer to indent using spaces instead of the tab character, just change
'\t'in the finaleditor.rereplaceline to' 'then save the script, before running it.The PythonScript FAQ explains everything you need to know for how to install the plugin (either PythonScript 2 or 3 [I recommend 3]), how to create the script by copying from this post, and how to run it.
note: the above script will also live at https://github.com/pryrt/nppStuff/blob/main/pythonScripts/nppCommunity/27xxx/p27254_PrettyPrintBadXml.py
- Puts each
-
P PeterJones referenced this topic