Community
    • Login

    How to find two or more non-consecutive tabs in a line?

    Scheduled Pinned Locked Moved General Discussion
    21 Posts 5 Posters 4.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @glossar
      last edited by

      @glossar

      How about this?:

      (?-s)^.*?\t[^\t]+\t.*?$

      Alan KilbornA 1 Reply Last reply Reply Quote 3
      • glossarG
        glossar
        last edited by

        Hi Alan,

        Thank you but sadly it won’t work. It finds only two tabs, each in every other line, at least in my file, whereas it should locate a line that contain 2 or more tabs in it. (e.g.: blah [tab] blah blah more blah [tab] (blah blah [tab] blah)… ).

        Alan KilbornA 1 Reply Last reply Reply Quote 1
        • Alan KilbornA
          Alan Kilborn @Alan Kilborn
          last edited by

          This raises maybe an interesting discussion: When are characters inside a character class notation, which means inside [ and ] non literal? On first crafting the above regex, I thought, this isn’t going to work, it is going to look for \ or t separately, not “tab” characters. But lo and behold, it does look for tabs. What are the rules for this?

          I know that [\R] will match \ or R and not match \R but that may be a special case and invalid because it can match possibly 2 characters, not just one.

          But there must be some general rules on what is special inside […] and [^…] … besides the “specialness” of - when used as a ranger, example [a-z] and the special way needed to get ] to be included in the set…

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @glossar
            last edited by

            @glossar said:

            Thank you but sadly it won’t work.

            Hmmm. Works for me with a Mark operation shown here:

            Imgur

            I copied your text from this thread, did a regex replace on it for \[tab\] with \t…and then applied the regex specified earlier to redmark the text.

            1 Reply Last reply Reply Quote 2
            • glossarG
              glossar
              last edited by

              I can confirm that it finds a line that contains two tabs but if a line doesn’t meet the criteria, it looks further (greedy, you say? :) )and hence finds the following line together, which in the end looks like “every other line”. But I’m pretty sure it skips the \r\n.of a line if this line contains only one tab. Can you limit the regex, so it should look for and within only one line (by line, I mean anything between ^ and \r\n).

              Alan KilbornA Meta ChuhM 2 Replies Last reply Reply Quote 3
              • Alan KilbornA
                Alan Kilborn @glossar
                last edited by Alan Kilborn

                @glossar

                Ah, yes, okay, that makes sense. The [^\t]+ will capture across line-boundaries. At this point I will bow out and let the regex master @guy038 step in… :)

                And maybe he can comment on my “interesting disussion” post above as well.

                1 Reply Last reply Reply Quote 2
                • Meta ChuhM
                  Meta Chuh moderator @glossar
                  last edited by Meta Chuh

                  maybe a screenshot helps:
                  Imgur

                  1 Reply Last reply Reply Quote 2
                  • glossarG
                    glossar
                    last edited by

                    I can’t see the screenshots above - neither on this page nor when clicking on it. All I see is a broken-image-file-icon and “Imgur” next to it.

                    1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn
                      last edited by

                      Okay, one more try. It could be as simple(!) as changing it to this:

                      (?-s)^.*?\t(?!\t).+?\t.*?$

                      :)

                      1 Reply Last reply Reply Quote 3
                      • glossarG
                        glossar
                        last edited by

                        Thanks, that now works like a charm! :)

                        While we are at it, how about building another regex that locates a line that contains no tab? :)

                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                        • Alan KilbornA
                          Alan Kilborn @glossar
                          last edited by

                          @glossar said:

                          regex that locates a line that contains no tab?

                          There might be better ones, but this one seems to work:

                          ^((?!\t).)*$

                          glossarG 1 Reply Last reply Reply Quote 3
                          • guy038G
                            guy038
                            last edited by guy038

                            Hi, @glossar, @alan-kilborn, and All,

                            A second solution could be :

                            SEARCH (?-s)(?=.*\t.*\t).+

                            A third solution could be, using the Mark dialog, w/o checking the Bookmark line option :

                            MARK (?-s)\t.*\t


                            Note, @alan-kilborn, that your regex should be changed into :

                            SEARCH (?-s)^.*?\t[^\t\r\n]+\t.*?$

                            To avoid wrong multi-lines match. However, this solution still misses some possibilities !


                            You may test these 3 regexes, above, against the sample test, below :

                            ---------------------------- 1 TEXT block without TAB -----> KO <----- ( because NO tabulation )
                            abcd
                            ---------------------------- 1 TAB  without TEXT ----------> KO <----- ( because ONE tabulation ONLY )
                            	
                            ---------------------------- 2 TABs without TEXT ----------- OK ------
                            		
                            ---------------------------- 3 TABs without TEXT ----------- OK ------
                            			
                            ---------------------------- 1 TAB  + 1 TEXT block --------> KO <----- ( because ONE tabulation ONLY )
                            abcd	
                            	abcd
                            ---------------------------- 1 TAB  + 2 TEXT blocks -------> KO <----- ( because ONE tabulation ONLY )
                            abcd	efgh
                            ---------------------------- 2 TABs + 1 TEXT block --------- OK ------
                            efgh		
                            	efgh	
                            		efgh
                            ---------------------------- 2 TABs + 2 TEXT blocks -------- OK ------
                            abcd	efgh	
                            abcd		ijkm
                            	efgh	ijkl
                            ---------------------------- 2 TABs + 3 TEXT blocks -------- OK ------
                            abcd	efgh	ijkl
                            ---------------------------- 3 TABs + 1 Text block --------- OK ------
                            abcd			
                            	efgh		
                            		ijkl	
                            			mnop
                            ---------------------------- 3 TABs + 2 Text blocks -------- OK ------
                            abcd	efgh		
                            abcd		ijkl	
                            abcd			monp
                            	efgh	ijkl	
                            	efgh		monp
                            		ijkl	monp
                            ---------------------------- 3 TABs + 3 Text blocks -------- OK ------
                            abcd	efgh	ijkm	
                            	efgh	ijkl	mnop
                            ---------------------------- 3 TABs + 4 Text blocks -------- OK ------
                            abcd	efgh	ijkl	mnop
                            

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 3
                            • PeterJonesP
                              PeterJones
                              last edited by PeterJones

                              @glossar , @Alan-Kilborn , @Meta-Chuh , et alia,

                              Unfortunately, the (?-s) only changes the behavior of . with respect to newlines; it doesn’t change character classes, so [^\t]+ means “one or more characters that don’t match a TAB, even if those characters are newlines”. By changing the full regex to (?-s)^.*?\t[^\t\r\n]+\t.*?$, I was able to get it to skip lines like @Meta-Chuh 's example of x instead of the TAB. The class [^\t\r\n] means “match one or more characters that isn’t any of TAB, CR (carriage return), or LF (line-feed)”

                              I am not as regex expert as @guy038, so I may be misinterpreting; however, the boost docs say (emphasis mine)

                              Escaped Characters
                              All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [[]] would match either of [ or ] while [\W\d] would match any character that is either a “digit”, or is not a “word” character.

                              Since \R doesn’t match a “single character” (it can match a single character or a pair of characters more than one character, see boost’s “Matching Line Endings” section), it doesn’t fall within the allowable escape sequences permitted in the character class.

                              edit: while typing this up, four more posts were made. Hopefully, I still added to the discussion.
                              edit 2: clarify the \R

                              Alan KilbornA 1 Reply Last reply Reply Quote 4
                              • Alan KilbornA
                                Alan Kilborn @PeterJones
                                last edited by

                                @PeterJones said:

                                Hopefully, I still added to the discussion.

                                You did, and you helped make it an “interesting discussion”. thanks.

                                1 Reply Last reply Reply Quote 1
                                • glossarG
                                  glossar
                                  last edited by

                                  Alan, the second one that finds no-tab :), works, thank you.

                                  Guy and Peter - Thank you for stepping-in! :) Much appreciated!

                                  Have a nice day!

                                  1 Reply Last reply Reply Quote 3
                                  • guy038G
                                    guy038
                                    last edited by guy038

                                    Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                    Here is an other solution, which looks for all contents of lines containing, at least , 2 tabulation chars ( can’t do shorter ! ) :

                                    SEARCH (?-s).*\t.*\t.*

                                    Just for information, an other formulation of the Alan’s regex, which searches lines which do not contain any tabulation char, could be :

                                    SEARCH (?!.*\t)^.+


                                    Negative character classes are often misunderstood, Indeed ! When you’re using, for instance, the negative class character below :

                                    [^<char1><char2><char3>-<char4>]

                                    It will match ANY Unicode character which is DIFFERENT from, either <char1>, <char2> and all characters between <char3> and <char4> included. So, most of the time, it probably matches the \r and \n END of Line characters. To avoid matching these line-break chars, just insert \r and \n, inside the negative class, at any location, after the ^, except in ranges :

                                    [^<char1>\n<char2>\t<char3>-<char4>]

                                    Cheers,

                                    guy038

                                    1 Reply Last reply Reply Quote 3
                                    • glossarG
                                      glossar @Alan Kilborn
                                      last edited by glossar

                                      @Alan-Kilborn said:

                                      @glossar said:

                                      regex that locates a line that contains no tab?

                                      There might be better ones, but this one seems to work:

                                      ^((?!\t).)*$

                                      Hi @alan-kilborn,
                                      Is it possible for you to modify this regex so shat it should skip blank lines, i.e. the ones containing no characters at all, just (if applicable, ^ and) \r\n. Currently the regex finds blank lines as well since they , too, meet the criteria “no-tab”.

                                      Thanks in advance!

                                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                                      • guy038G
                                        guy038
                                        last edited by guy038

                                        Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                        I may be mistaken but I think that the regex (?!.*\t)^.+, of my previous post, just meet your needs, doesn’t it ?

                                        Cheers,

                                        guy038

                                        1 Reply Last reply Reply Quote 4
                                        • Alan KilbornA
                                          Alan Kilborn @glossar
                                          last edited by

                                          @glossar said:

                                          Is it possible for you to modify this regex so shat it should skip blank lines

                                          So we should look at what the original means:

                                          ^((?!\t).)*$

                                          It says (basically) to match zero or more occurrences (because of the use of *) of anything that is not TAB. If we change it to match ONE or more occurrences (we’re going to change * to + to do this) of anything that is not TAB). Because we have to match at least ONE thing, empty/blank lines are no longer matched:

                                          ^((?!\t).)+$

                                          Which is basically what @guy038 said, but I wanted to elaborate a bit!

                                          1 Reply Last reply Reply Quote 2
                                          • guy038G
                                            guy038
                                            last edited by

                                            Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones and All,

                                            Fundamentally, the new Alan’s solution and mine give the same right results, i.e. to match any non-empty line which does not contain a tabulation character !

                                            By the way, we, both, forget to add the leading in-line-modifier (?-s) to be sure that, even you previously ticked the . matches newline option, the regex engine will suppose that any . char does match a single standard character, only !

                                            So, our two solutions should be :

                                            Alan : (?-s)^((?!\t).)+$

                                            Guy : (?-s)(?!.*\t)^.+


                                            However, note that the logic, underlying these 2 regular expressions, is a bit different :

                                            • In the Alan’s regex, from beginning of line ( ^ ), the regex engine matches for one or more standard characters, till the end of line ( $ ), ONLY IF each standard character encountered is not a tabulation character, due to the negative look-ahead (?!\t), located right before the . regex character

                                            • In the Guy’s regex, the regex engine matches for all the standard characters of a line, ( ^.+ ), ONLY IF ( implicitly at beginning of line ) it cannot find a tabulation character further on, at any position of current line, due to the negative look-ahead (?!.*\t)

                                            I did a test with a file of 2,500,000 lines, half of which contained 1 tabulation character and, clearly, the Alan’s version is faster ! ( 2 mn 15 s for Alan instead of 5mn for my version )

                                            BR

                                            guy038

                                            1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors