Community
    • Login

    Regexp multiline text replace when care about line delimiters

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 3 Posters 80 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • I HayI
      I Hay
      last edited by PeterJones

      Fellow Notepad++ Users,

      Could you please help me the the following search-and-replace problem I am having?

      I have a list of names from a church graveyard, they have a grave number and a line for each person commemorated there, but for subsequent lines for a stone no grave number is given. So I want to match a line that starts with a number - eg. 24, and then apply that number to the start of each subsequent line for that grave marker (each of the subsequent lines has a space in the first of the tab-separated fields. Basically, capturing the digits at the start of the first line and replacing the space at the start of the next line with that number. Ideally for subsequent rows too, if possible. So it’s a multiline searching problem, but I care about the line start/ends.

      Here is the data I currently have (anonymmised) (“before” data) (I’ve replaced all tabs with \t, CRs with \r\n, and spaces with .):

      No\tName\tRelationship\tDate.of.Birth\tDate.of.Death\tAge\tOther.information\r\n
      1\tFred.BLOGGS\t.\t.\tMay.13.1800\t93.yrs\tBrotherton\r\n
      .\tLiz\twife\t.\tApr.39.1840\t64.yrs\t.\r\n
      .\tAnn.DIRGEWALL\t(husband.:.Jay)\t.\tJny.2.1955\t61.yrs\tGable.St\r\n
      2\tUnmarked.grave\t.\t.\t.\t.\t.\r\n
      3\tUnmarked.grave\t.\t.\t.\t.\t.\r\n
      4\tJack.GARDNER\t.\t.\tDec.5.1967\t75.yrs\tGrove.Rd.\r\n
      .\tJane\twife\t.\tSep.2.1969\t70.yrs\tBlackpool\r\n
      .\tMary.JONES\twife.of.Adam\t.\tJly.4.1930\t.\t.\r\n
      5\tHenry.ALBERT\t.\t.\tJny.4.1900\t68.yrs\tAbbeyrange\r\n
      .\tLola\twife\t.\tDec.28.1909\t76.yrs\t.\r\n
      .\tJack.HARBOR\tson.in.law\t.\tJan.29.1976\t49.yrs\t.\r\n
      .\tJulie\twife\t.\tMay.29.1999\t72.yrs\t.\r\n
      

      Here is how I would like that data to look (“after” data):

      No\tName\tRelationship\tDate.of.Birth\tDate.of.Death\tAge\tOther.information\r\n
      1\tFred.BLOGGS\t.\t.\tMay.13.1800\t93.yrs\tBrotherton\r\n
      1\tLiz\twife\t.\tApr.39.1840\t64.yrs\t.\r\n
      1\tAnn.DIRGEWALL\t(husband.:.Jay)\t.\tJny.2.1955\t61.yrs\tGable.St\r\n
      2\tUnmarked.grave\t.\t.\t.\t.\t.\r\n
      3\tUnmarked.grave\t.\t.\t.\t.\t.\r\n
      4\tJack.GARDNER\t.\t.\tDec.5.1967\t75.yrs\tGrove.Rd.\r\n
      4\tJane\twife\t.\tSep.2.1969\t70.yrs\tBlackpool\r\n
      4\tMary.JONES\twife.of.Adam\t.\tJly.4.1930\t.\t.\r\n
      5\tHenry.ALBERT\t.\t.\tJny.4.1900\t68.yrs\tAbbeyrange\r\n
      5\tLola\twife\t.\tDec.28.1909\t76.yrs\t.\r\n
      5\tJack.HARBOR\tson.in.law\t.\tJan.29.1976\t49.yrs\t.\r\n
      5\tJulie\twife\t.\tMay.29.1999\t72.yrs\t.\r\n
      

      initial data: db52acf7-ef10-4500-8f87-cdf14a6d3793-image.png

      To accomplish this, I have tried various regexp matches, including ^(\d+?)(\t.*). \t for matching but anything using “. matches newline” is greedy and matches the whole lot, no matter what I put on the end. (using \1\2.\1\t for the replace expression),

      Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?

      —

      moderator added code markdown around text; please don’t forget to use the </> button to mark example text as “code” so that characters don’t get changed by the forum

      CoisesC 1 Reply Last reply Reply Quote 0
      • CoisesC
        Coises @I Hay
        last edited by Coises

        @I-Hay said in Regexp multiline text replace when care about line delimiters:

        To accomplish this, I have tried various regexp matches, including ^(\d+?)(\t.*). \t for matching but anything using “. matches newline” is greedy and matches the whole lot, no matter what I put on the end. (using \1\2.\1\t for the replace expression),

        Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?

        You were close. Try:

        Find what: (?-s)^(\d+)(\t.*\R) \t
        Replace with: \1\2\1\t

        and Replace All repeatedly, until there are no more matches.

        The problem with your expression was that it doesn’t do anything to identify the end of a line. If . matches newline is checked, the match goes right through the new line and eats the entire file; if it’s not checked, nothing in your expression can match the new line.

        The (?-s) in my expression has the same effect as unchecking . matches newline. The \R matches any line ending (CR, LF or CRLF).

        1 Reply Last reply Reply Quote 3
        • guy038G
          guy038
          last edited by guy038

          Hello, @i-hay, @coises and All,

          I found out an easy solution to your problem ! To that purpose, I used the the Edit > Line Operations > Reverse Line Order menu option, which is quite powerful when multi-lines regex is involved !


          So, let’s consider your INPUT text :

          No	Name	Relationship	Date of Birth	Date of Death	Age	Other information
          1	Fred BLOGGS	 	 	May 13 1800	93 yrs	Brotherton
           	Liz	wife	 	Apr 39 1840	64 yrs	 
           	Ann DIRGEWALL	(husband : Jay)	 	Jny 2 1955	61 yrs	Gable St
          2	Unmarked grave	 	 	 	 	 
          3	Unmarked grave	 	 	 	 	 
          4	Jack GARDNER	 	 	Dec 5 1967	75 yrs	Grove Rd 
           	Jane	wife	 	Sep 2 1969	70 yrs	Blackpool
           	Mary JONES	wife of Adam	 	Jly 4 1930	 	 
          5	Henry ALBERT	 	 	Jny 4 1900	68 yrs	Abbeyrange
           	Lola	wife	 	Dec 28 1909	76 yrs	 
           	Jack HARBOR	son in law	 	Jan 29 1976	49 yrs	 
           	Julie	wife	 	May 29 1999	72 yrs	 
          
          • First, select all the text, which needs re-numbering

          • Use the Edit > Line Operations > Reverse Line Order menu option

          => You should get this temporary OUTPUT text :

           	Julie	wife	 	May 29 1999	72 yrs	 
           	Jack HARBOR	son in law	 	Jan 29 1976	49 yrs	 
           	Lola	wife	 	Dec 28 1909	76 yrs	 
          5	Henry ALBERT	 	 	Jny 4 1900	68 yrs	Abbeyrange
           	Mary JONES	wife of Adam	 	Jly 4 1930	 	 
           	Jane	wife	 	Sep 2 1969	70 yrs	Blackpool
          4	Jack GARDNER	 	 	Dec 5 1967	75 yrs	Grove Rd 
          3	Unmarked grave	 	 	 	 	 
          2	Unmarked grave	 	 	 	 	 
           	Ann DIRGEWALL	(husband : Jay)	 	Jny 2 1955	61 yrs	Gable St
           	Liz	wife	 	Apr 39 1840	64 yrs	 
          1	Fred BLOGGS	 	 	May 13 1800	93 yrs	Brotherton
          No	Name	Relationship	Date of Birth	Date of Death	Age	Other information
          
          • Now, move back at the beginning of the reversed text

          • Open the Replace dialog ( Ctrl + H )

            • Uncheck all the box options

            • FIND (?s)^\x20(?=(?:.+?)^(\d+))

            • REPLACE \1

            • Select the Regular expression search mode

            • Click, once only, on the Replace All button

          => Your temporary text is then changed into : :

          5	Julie	wife	 	May 29 1999	72 yrs	 
          5	Jack HARBOR	son in law	 	Jan 29 1976	49 yrs	 
          5	Lola	wife	 	Dec 28 1909	76 yrs	 
          5	Henry ALBERT	 	 	Jny 4 1900	68 yrs	Abbeyrange
          4	Mary JONES	wife of Adam	 	Jly 4 1930	 	 
          4	Jane	wife	 	Sep 2 1969	70 yrs	Blackpool
          4	Jack GARDNER	 	 	Dec 5 1967	75 yrs	Grove Rd 
          3	Unmarked grave	 	 	 	 	 
          2	Unmarked grave	 	 	 	 	 
          1	Ann DIRGEWALL	(husband : Jay)	 	Jny 2 1955	61 yrs	Gable St
          1	Liz	wife	 	Apr 39 1840	64 yrs	 
          1	Fred BLOGGS	 	 	May 13 1800	93 yrs	Brotherton
          No	Name	Relationship	Date of Birth	Date of Death	Age	Other information
          
          • Finally, redo a Edit > Line Operations > Reverse Line Order menu option

          Here we are ! We get your expected OUTPUT text, below :

          No	Name	Relationship	Date of Birth	Date of Death	Age	Other information
          1	Fred BLOGGS	 	 	May 13 1800	93 yrs	Brotherton
          1	Liz	wife	 	Apr 39 1840	64 yrs	 
          1	Ann DIRGEWALL	(husband : Jay)	 	Jny 2 1955	61 yrs	Gable St
          2	Unmarked grave	 	 	 	 	 
          3	Unmarked grave	 	 	 	 	 
          4	Jack GARDNER	 	 	Dec 5 1967	75 yrs	Grove Rd 
          4	Jane	wife	 	Sep 2 1969	70 yrs	Blackpool
          4	Mary JONES	wife of Adam	 	Jly 4 1930	 	 
          5	Henry ALBERT	 	 	Jny 4 1900	68 yrs	Abbeyrange
          5	Lola	wife	 	Dec 28 1909	76 yrs	 
          5	Jack HARBOR	son in law	 	Jan 29 1976	49 yrs	 
          5	Julie	wife	 	May 29 1999	72 yrs	 
          

          Some details about the search regex :

          • First, the (?s) syntax is an in-line modifier which has the same effect that checking the . matches newline option of the Replace dialog

          • Then, the regex just looks for a space character \x20, at beginning of a line ( ^ ), ONLY IF it is followed with a look-ahead syntax ((?=..........) ), coming next

          • This structure looks, itself, for any character, even new-line char(s), in a non-capturing group ( (?:....) ) , till the nearest character / line ( .+? ), beginning by a number ( ^\d+ ), that is stored as group 1, due to the embedded parentheses ( (\d+) )

          • In replacement, any space character ( \x20 ) is then replaced with the contents of the group 1, which is our desired number !

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 3
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors