Community
    • Login

    How to find two or more non-consecutive tabs in a line?

    Scheduled Pinned Locked Moved General Discussion
    21 Posts 5 Posters 8.3k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA Offline
      Alan Kilborn @glossar
      last edited by

      @glossar said:

      regex that locates a line that contains no tab?

      There might be better ones, but this one seems to work:

      ^((?!\t).)*$

      glossarG 1 Reply Last reply Reply Quote 3
      • guy038G Offline
        guy038
        last edited by guy038

        Hi, @glossar, @alan-kilborn, and All,

        A second solution could be :

        SEARCH (?-s)(?=.*\t.*\t).+

        A third solution could be, using the Mark dialog, w/o checking the Bookmark line option :

        MARK (?-s)\t.*\t


        Note, @alan-kilborn, that your regex should be changed into :

        SEARCH (?-s)^.*?\t[^\t\r\n]+\t.*?$

        To avoid wrong multi-lines match. However, this solution still misses some possibilities !


        You may test these 3 regexes, above, against the sample test, below :

        ---------------------------- 1 TEXT block without TAB -----> KO <----- ( because NO tabulation )
        abcd
        ---------------------------- 1 TAB  without TEXT ----------> KO <----- ( because ONE tabulation ONLY )
        	
        ---------------------------- 2 TABs without TEXT ----------- OK ------
        		
        ---------------------------- 3 TABs without TEXT ----------- OK ------
        			
        ---------------------------- 1 TAB  + 1 TEXT block --------> KO <----- ( because ONE tabulation ONLY )
        abcd	
        	abcd
        ---------------------------- 1 TAB  + 2 TEXT blocks -------> KO <----- ( because ONE tabulation ONLY )
        abcd	efgh
        ---------------------------- 2 TABs + 1 TEXT block --------- OK ------
        efgh		
        	efgh	
        		efgh
        ---------------------------- 2 TABs + 2 TEXT blocks -------- OK ------
        abcd	efgh	
        abcd		ijkm
        	efgh	ijkl
        ---------------------------- 2 TABs + 3 TEXT blocks -------- OK ------
        abcd	efgh	ijkl
        ---------------------------- 3 TABs + 1 Text block --------- OK ------
        abcd			
        	efgh		
        		ijkl	
        			mnop
        ---------------------------- 3 TABs + 2 Text blocks -------- OK ------
        abcd	efgh		
        abcd		ijkl	
        abcd			monp
        	efgh	ijkl	
        	efgh		monp
        		ijkl	monp
        ---------------------------- 3 TABs + 3 Text blocks -------- OK ------
        abcd	efgh	ijkm	
        	efgh	ijkl	mnop
        ---------------------------- 3 TABs + 4 Text blocks -------- OK ------
        abcd	efgh	ijkl	mnop
        

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 3
        • PeterJonesP Offline
          PeterJones
          last edited by PeterJones

          @glossar , @Alan-Kilborn , @Meta-Chuh , et alia,

          Unfortunately, the (?-s) only changes the behavior of . with respect to newlines; it doesn’t change character classes, so [^\t]+ means “one or more characters that don’t match a TAB, even if those characters are newlines”. By changing the full regex to (?-s)^.*?\t[^\t\r\n]+\t.*?$, I was able to get it to skip lines like @Meta-Chuh 's example of x instead of the TAB. The class [^\t\r\n] means “match one or more characters that isn’t any of TAB, CR (carriage return), or LF (line-feed)”

          I am not as regex expert as @guy038, so I may be misinterpreting; however, the boost docs say (emphasis mine)

          Escaped Characters
          All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [object Object] would match either of [ or ] while [\W\d] would match any character that is either a “digit”, or is not a “word” character.

          Since \R doesn’t match a “single character” (it can match a single character or a pair of characters more than one character, see boost’s “Matching Line Endings” section), it doesn’t fall within the allowable escape sequences permitted in the character class.

          edit: while typing this up, four more posts were made. Hopefully, I still added to the discussion.
          edit 2: clarify the \R

          Alan KilbornA 1 Reply Last reply Reply Quote 4
          • Alan KilbornA Offline
            Alan Kilborn @PeterJones
            last edited by

            @PeterJones said:

            Hopefully, I still added to the discussion.

            You did, and you helped make it an “interesting discussion”. thanks.

            1 Reply Last reply Reply Quote 1
            • glossarG Offline
              glossar
              last edited by

              Alan, the second one that finds no-tab :), works, thank you.

              Guy and Peter - Thank you for stepping-in! :) Much appreciated!

              Have a nice day!

              1 Reply Last reply Reply Quote 3
              • guy038G Offline
                guy038
                last edited by guy038

                Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                Here is an other solution, which looks for all contents of lines containing, at least , 2 tabulation chars ( can’t do shorter ! ) :

                SEARCH (?-s).*\t.*\t.*

                Just for information, an other formulation of the Alan’s regex, which searches lines which do not contain any tabulation char, could be :

                SEARCH (?!.*\t)^.+


                Negative character classes are often misunderstood, Indeed ! When you’re using, for instance, the negative class character below :

                [^<char1><char2><char3>-<char4>]

                It will match ANY Unicode character which is DIFFERENT from, either <char1>, <char2> and all characters between <char3> and <char4> included. So, most of the time, it probably matches the \r and \n END of Line characters. To avoid matching these line-break chars, just insert \r and \n, inside the negative class, at any location, after the ^, except in ranges :

                [^<char1>\n<char2>\t<char3>-<char4>]

                Cheers,

                guy038

                1 Reply Last reply Reply Quote 3
                • glossarG Offline
                  glossar @Alan Kilborn
                  last edited by glossar

                  @Alan-Kilborn said:

                  @glossar said:

                  regex that locates a line that contains no tab?

                  There might be better ones, but this one seems to work:

                  ^((?!\t).)*$

                  Hi @alan-kilborn,
                  Is it possible for you to modify this regex so shat it should skip blank lines, i.e. the ones containing no characters at all, just (if applicable, ^ and) \r\n. Currently the regex finds blank lines as well since they , too, meet the criteria “no-tab”.

                  Thanks in advance!

                  Alan KilbornA 1 Reply Last reply Reply Quote 0
                  • guy038G Offline
                    guy038
                    last edited by guy038

                    Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                    I may be mistaken but I think that the regex (?!.*\t)^.+, of my previous post, just meet your needs, doesn’t it ?

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 4
                    • Alan KilbornA Offline
                      Alan Kilborn @glossar
                      last edited by

                      @glossar said:

                      Is it possible for you to modify this regex so shat it should skip blank lines

                      So we should look at what the original means:

                      ^((?!\t).)*$

                      It says (basically) to match zero or more occurrences (because of the use of *) of anything that is not TAB. If we change it to match ONE or more occurrences (we’re going to change * to + to do this) of anything that is not TAB). Because we have to match at least ONE thing, empty/blank lines are no longer matched:

                      ^((?!\t).)+$

                      Which is basically what @guy038 said, but I wanted to elaborate a bit!

                      1 Reply Last reply Reply Quote 2
                      • guy038G Offline
                        guy038
                        last edited by

                        Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones and All,

                        Fundamentally, the new Alan’s solution and mine give the same right results, i.e. to match any non-empty line which does not contain a tabulation character !

                        By the way, we, both, forget to add the leading in-line-modifier (?-s) to be sure that, even you previously ticked the . matches newline option, the regex engine will suppose that any . char does match a single standard character, only !

                        So, our two solutions should be :

                        Alan : (?-s)^((?!\t).)+$

                        Guy : (?-s)(?!.*\t)^.+


                        However, note that the logic, underlying these 2 regular expressions, is a bit different :

                        • In the Alan’s regex, from beginning of line ( ^ ), the regex engine matches for one or more standard characters, till the end of line ( $ ), ONLY IF each standard character encountered is not a tabulation character, due to the negative look-ahead (?!\t), located right before the . regex character

                        • In the Guy’s regex, the regex engine matches for all the standard characters of a line, ( ^.+ ), ONLY IF ( implicitly at beginning of line ) it cannot find a tabulation character further on, at any position of current line, due to the negative look-ahead (?!.*\t)

                        I did a test with a file of 2,500,000 lines, half of which contained 1 tabulation character and, clearly, the Alan’s version is faster ! ( 2 mn 15 s for Alan instead of 5mn for my version )

                        BR

                        guy038

                        1 Reply Last reply Reply Quote 2

                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                        With your input, this post could be even better 💗

                        Register Login
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors