Community
    • Login

    Find NULL Lines with RegEx

    Scheduled Pinned Locked Moved General Discussion
    5 Posts 4 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Jerry GoedertJ
      Jerry Goedert
      last edited by

      Hi, I have found numerous files that
      contain a NULL line, no spaces, no CR\LF, no tab, just nothing.
      Example:
      Birth Date 1 Jan 1947
      Event Type: Residence
      Household Identifier 963991122

      “United States Public Records, 1970-2009”, database, FamilySearch
      The lines between are NULL. I don’t know what to use to find them
      and remove most of them.
      How can I detect the NULL line?
      Jerry

      Alan KilbornA Terry RT rdipardoR 3 Replies Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Jerry Goedert
        last edited by Alan Kilborn

        @Jerry-Goedert

        Search in Regular Expression mode for ^$.

        Wait… what does this mean?:

        no CR\LF

        If you don’t have that, you don’t have a “line”, so…

        1 Reply Last reply Reply Quote 2
        • Terry RT
          Terry R @Jerry Goedert
          last edited by Terry R

          @Jerry-Goedert said in Find NULL Lines with RegEx:

          contain a NULL line, no spaces, no CR\LF, no tab, just nothing.

          Can you turn on “show all characters” which is under the View, then “Show Symbol” menu item.

          I would think it should look like this:

          8a70e9a7-c009-4661-b7b0-82e954d4ace4-image.png

          By turning on this feature you should see that the “NULL” line (as you describe it) actually is a line, just with no characters on it. So it has a CR/LF combination.

          If your’s doesn’t look like this after turning on that feature, show us a screen print like I did.

          Terry

          PS if it’s a line, then it WILL have a line number

          1 Reply Last reply Reply Quote 2
          • rdipardoR
            rdipardo @Jerry Goedert
            last edited by

            A NULL character can be matched by searching the code point U+0000.

            • Ctrl + F
            • Find what: \x{0000} [^1]
            • Search Mode: Regular Expression
            • Click :“Find All in Current Document” [^2]

            match_null.png

            You can recreate the sample text shown above using python(3):

            import re
            
            data = """
                Birth Date 1 Jan 1947
                Event Type: Residence
                Household Identifier 963991122
                """
            
            text_with_nulls = bytes(re.sub(r'\s', '\x00', data), 'ascii')
            
            with open('text_with_nulls.txt', 'wb') as file:
                file.write(text_with_nulls)
            
            

            I’m guessing the file that @Jerry-Goedert described was generated by a government database using some ancient 7-bit collation. Empty record fields containing NULL in the database are probably showing up as single-byte character strings: "\0".


            [^1]: The Boost regex engine supports this syntax
            [^2]: Since there’s only one true “line” in the text shown above, you have to de-select the one result per line option to exactly reproduce my example:

            • Setting
            • Preferences
            • Searching
            • Uncheck “Search result Window: show only one entry per found line”

            no_single_search.png

            Alan KilbornA 1 Reply Last reply Reply Quote 4
            • Alan KilbornA
              Alan Kilborn @rdipardo
              last edited by

              @rdipardo

              I think your guess as to the OP’s data may have hit the nail squarely on the head.

              It would have been abundantly clear earlier if the OP had posted a screenshot of what he was working with.

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors