Find 'Not Found' Matches in list

Fern99

I am trying to search for ‘string’ (ex:‘var’) +‘X digit of numbers’ (ex: 4) from 0000 to a specified value’ (ex: 1999).

var0000
var0001
…
var1999

I am trying to find which ones are not found in all of the opened files in Notepad++.
(or the first one that is not found)

I tried this with multireplace, but I was not successful.
I am asking here for help if there is a way to do this and how.
Thanks.

guy038

Hello @fern99 and All,

I think that the regex, below, should work properly !

It will list, in the Find Results panel, all the files which do NOT contain any string from var0000 to var1999, at any position, in files.
Foe each file listed, It will highlight the very first character of current file.

FIND (?s-i)\A(?=.*var[01][0-9]{3})(*SKIP)(*F)|\A.

Remark : Do NOT use the almost identical regex (?s-i)\A(?=.*var[01][0-9]{3})(*SKIP)(*F)|\A, without the final regex dot . ! There’s, somehow, a bug, as running successively this pseudo-regex does NOT return the same results !

Best Regards,

guy038

Terry R

@guy038 said in Find 'Not Found' Matches in list:

I think that the regex, below, should work properly !

I think you have the wrong interpretation of the question.

I see it like the numbers from 0000 to 1999 exist except say 0002 and 0007. OP wants whichever is the first missing number.
Let’s say it is 0007.

I think this can only be solved by a programming language such as pythonscript.

Terry

PS actually it has to be 0002 if testing in numerical order. OP needs to expand on detail.

mkupper

I am puzzling over the subject line Find 'Not Found' Matches in list which implies there is a list of things to search for. The desire is to scan the list and to stop scanning as soon a something in that list is not found.

In the post body @Fern99 provides what seems to be an “example” of the list which could be be constructed on the fly by with the text string `var’ followed by the numeric values 0000 up to 1999.

Either way, a list of things in a file, or a programmatically generated list, involves scripting which is outside the scope of Notepad++.

I also believe that BOOST has issues with scanning an entire file for not-something. For example, (?s-i)\A^((?!var0000).)*$ starts misbehaving when a random text file was over 35000 characters and would also stop when it hit extended Unicode characters.

guy038

Hello, @fern99 , @terry-r, @mkupper and All,

@mkupper, as I’ve just finished a blog post, about the use of the (*SKIP)(*F) feature, here is a regex which allows to get all the contents of any file which NOT contain a specific string, anywhere :

(?s-i)\A.*var0000(*SKIP)(*F)|\A.*

It’s quite magic !

I tested it against a text file of 2,745,028 characters, containing 5,901 lines. And, of course, I re-saved the file, each time I changed the location of the var0000 string ( Case YES ) or the location of the var000 string ( case NO ), before running the search. So :

Whatever the location of the var0000 string, the above regex does NOT match anything
Whatever the location of the var000 string, the above regex DOES math all the text contents, so 2,745,040 characters, idem as Ctrl + A !

One IMPORTANT thing to understand is the current location of the regex engine when trying the left branch of the alternative (?s-i)\A.*var0000(*SKIP)(*F) :

If the left branch is matched, no back-tracking is allowed because of (*SKIP) verb but the regex engine location skipped to right after the last char of the range .*var0000. As no regex part exists between (*SKIP) and (*F), this location does NOT change. Then, the (*F) syntax discards this match, so far. So, the regex engine tries the right branch of the alternative. But the \A location cannot be reached anymore. So, the all process fails and NO match occurs, whatever the position of the var0000 string, in current file.
If the left branch is NOT matched, the engine location is STILL at the very beginning of current file and the regex engine simply tries the right alternative \A.* which, indeed, is, this time, a successful match selecting all the characters of current file.

Of course, in this second case, as it selects all, you could say that the regex engine is bugging and selects, by mistake, all the file contents. But luckily, it’s NOT the case ! To verify my assertion, just use, for example, the regex (?s-i)\A.*var0000(*SKIP)(*F)|\A.{10000} against a file of size > 10,000 bytes which do NOT contain the string var0000

=> You’ll see that, as expected, the FIRST 10,000 characters of the file have been selected, proving that the regex engine works properly, in that matter !

Now, what would happened if we omit the \A assertion in the above regex ? Well :

If the left branch of the alternative is matched, as this match is cancelled, due to the (SKIP)(*F) syntax, the engine normally tries the right branch .* and match all the characters from after the string var0000 or from the current position to the very end of file
If the left branch of the alternative is NOT matched, the engine location is STILL at the very beginning of current file and the regex engine simply tries the right alternative .* which is a successful match, selecting all the characters from current location till the very end of the file

See the summary table, below :

•-----------------------•-------------------------------------•------------------------------------------------------------------------------------------ -•
|   String 'var0000'    |                Regex                |                                      RESULTS of the                                        |
•-----------------------•-------------------------------------•------------------------------------------------------------------------------------------ -•
| NO  ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|\A     | Match the EMPTY location at the VERY BEGINNING of file                                     |
| NO  ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|\A.{n} | Match the FIRST N char(s) of file, if possible                                             |
| NO  ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|\A.*   | Match ALL chars of file                                                                    |
|                       |                                     |                                                                                            |
| NO  anywhere in file  | (?s-i)\A.*var0000(*SKIP)(*F)|       | Match an EMPTY location at CURRENT position                                                |
| NO  anywhere in file  | (?s-i)\A.*var0000(*SKIP)(*F)|.{n}   | Match N char(s) from CURRENT position, if possible                                         |
| NO  anywhere in file  | (?s-i)\A.*var0000(*SKIP)(*F)|.*     | Match ALL chars from CURRENT position till the very END of file                            |
|                       |                                     |                                                                                            |
|                       |                                     |                                                                                            |
| YES ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|\A     | NO match at all                                                                            |
| YES ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|\A.{n} | NO match at all                                                                            |
| YES ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|\A.*   | NO match at all                                                                            |
|                       |                                     |                                                                                            |
| YES ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|       | Match an EMPTY location AFTER 'var0000' or at current position                             |
| YES ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|.{n}   | Match N char(s) from    AFTER 'var0000' or from CURRENT position, if possible              |
| YES ANYWHERE in file  | (?s-i)\A.*var0000(*SKIP)(*F)|.*     | Match ALL chars from    AFTER 'var0000' or from CURRENT position till the very END of file |
•-----------------------•-------------------------------------•--------------------------------------------------------------------------------------------•

An IMPORTANT thing to note is that the generic regex :

Regex A(*SKIP)(*F)|Regex B is strictly identical to Regex B, when the Regex A CANNOT match anything !

Best Regards,

guy038

guy038

Hi, @fern99, @terry-r, @mkupper and All,

An other example :

To list all files which do NOT contain the strings ABC AND JKL AND XYZ, in upper case, from your opened documents, use :

FIND (?s-i)(?=\A.*(?:ABC|JKL|XYZ))(*SKIP)(*F)|\A.
Check the Wrap around option
Click on the Find All in All Opened Documents button

REMARK : Do keep the LAST regex dot ( . ), after \A. If omitted, consecutive searches of this regex does *NOT give the same results ! ( Bug ? )

BR

guy038