Find 'Not Found' Matches in list
-
I am trying to search for ‘string’ (ex:‘var’) +‘X digit of numbers’ (ex: 4) from 0000 to a specified value’ (ex: 1999).
var0000
var0001
…
var1999I am trying to find which ones are not found in all of the opened files in Notepad++.
(or the first one that is not found)I tried this with multireplace, but I was not successful.
I am asking here for help if there is a way to do this and how.
Thanks. -
Hello @fern99 and All,
I think that the regex, below, should work properly !
-
It will list, in the Find Results panel, all the files which do NOT contain any string from
var0000
tovar1999
, at any position, in files. -
Foe each file listed, It will highlight the very first character of current file.
FIND
(?s-i)\A(?=.*var[01][0-9]{3})(*SKIP)(*F)|\A.
Remark : Do NOT use the almost identical regex
(?s-i)\A(?=.*var[01][0-9]{3})(*SKIP)(*F)|\A
, without the final regex dot.
! There’s, somehow, a bug, as running successively this pseudo-regex does NOT return the same results !Best Regards,
guy038
-
-
@guy038 said in Find 'Not Found' Matches in list:
I think that the regex, below, should work properly !
I think you have the wrong interpretation of the question.
I see it like the numbers from 0000 to 1999 exist except say 0002 and 0007. OP wants whichever is the first missing number.
Let’s say it is 0007.I think this can only be solved by a programming language such as pythonscript.
Terry
PS actually it has to be 0002 if testing in numerical order. OP needs to expand on detail.
-
I am puzzling over the subject line
Find 'Not Found' Matches in list
which implies there is a list of things to search for. The desire is to scan the list and to stop scanning as soon a something in that list is not found.In the post body @Fern99 provides what seems to be an “example” of the list which could be be constructed on the fly by with the text string `var’ followed by the numeric values 0000 up to 1999.
Either way, a list of things in a file, or a programmatically generated list, involves scripting which is outside the scope of Notepad++.
I also believe that BOOST has issues with scanning an entire file for not-something. For example,
(?s-i)\A^((?!var0000).)*$
starts misbehaving when a random text file was over 35000 characters and would also stop when it hit extended Unicode characters. -
Hello, @fern99 , @terry-r, @mkupper and All,
@mkupper, as I’ve just finished a blog post, about the use of the
(*SKIP)(*F)
feature, here is a regex which allows to get all the contents of any file which NOT contain a specific string, anywhere :(?s-i)\A.*var0000(*SKIP)(*F)|\A.*
It’s quite magic !
I tested it against a text file of
2,745,028
characters, containing5,901
lines. And, of course, I re-saved the file, each time I changed the location of thevar0000
string ( Case YES ) or the location of thevar000
string ( case NO ), before running the search. So :-
Whatever the location of the
var0000
string, the above regex does NOT match anything -
Whatever the location of the
var000
string, the above regex DOES math all the text contents, so2,745,040
characters, idem asCtrl + A
!
One IMPORTANT thing to understand is the current location of the regex engine when trying the left branch of the alternative
(?s-i)\A.*var0000(*SKIP)(*F)
:-
If the left branch is matched, no back-tracking is allowed because of
(*SKIP)
verb but the regex engine location skipped to right after the last char of the range.*var0000
. As no regex part exists between(*SKIP)
and(*F)
, this location does NOT change. Then, the(*F)
syntax discards this match, so far. So, the regex engine tries the right branch of the alternative. But the\A
location cannot be reached anymore. So, the all process fails and NO match occurs, whatever the position of thevar0000
string, in current file. -
If the left branch is NOT matched, the engine location is STILL at the very beginning of current file and the regex engine simply tries the right alternative
\A.*
which, indeed, is, this time, a successful match selecting all the characters of current file.
Of course, in this second case, as it selects all, you could say that the regex engine is bugging and selects, by mistake, all the file contents. But luckily, it’s NOT the case ! To verify my assertion, just use, for example, the regex
(?s-i)\A.*var0000(*SKIP)(*F)|\A.{10000}
against a file of size> 10,000
bytes which do NOT contain the stringvar0000
=> You’ll see that, as expected, the FIRST
10,000
characters of the file have been selected, proving that the regex engine works properly, in that matter !
Now, what would happened if we omit the
\A
assertion in the above regex ? Well :-
If the left branch of the alternative is matched, as this match is cancelled, due to the
(SKIP)(*F)
syntax, the engine normally tries the right branch.*
and match all the characters from after the stringvar0000
or from the current position to the very end of file -
If the left branch of the alternative is NOT matched, the engine location is STILL at the very beginning of current file and the regex engine simply tries the right alternative
.*
which is a successful match, selecting all the characters from current location till the very end of the file
See the summary table, below :
•-----------------------•-------------------------------------•------------------------------------------------------------------------------------------ -• | String 'var0000' | Regex | RESULTS of the | •-----------------------•-------------------------------------•------------------------------------------------------------------------------------------ -• | NO ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|\A | Match the EMPTY location at the VERY BEGINNING of file | | NO ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|\A.{n} | Match the FIRST N char(s) of file, if possible | | NO ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|\A.* | Match ALL chars of file | | | | | | NO anywhere in file | (?s-i)\A.*var0000(*SKIP)(*F)| | Match an EMPTY location at CURRENT position | | NO anywhere in file | (?s-i)\A.*var0000(*SKIP)(*F)|.{n} | Match N char(s) from CURRENT position, if possible | | NO anywhere in file | (?s-i)\A.*var0000(*SKIP)(*F)|.* | Match ALL chars from CURRENT position till the very END of file | | | | | | | | | | YES ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|\A | NO match at all | | YES ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|\A.{n} | NO match at all | | YES ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|\A.* | NO match at all | | | | | | YES ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)| | Match an EMPTY location AFTER 'var0000' or at current position | | YES ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|.{n} | Match N char(s) from AFTER 'var0000' or from CURRENT position, if possible | | YES ANYWHERE in file | (?s-i)\A.*var0000(*SKIP)(*F)|.* | Match ALL chars from AFTER 'var0000' or from CURRENT position till the very END of file | •-----------------------•-------------------------------------•--------------------------------------------------------------------------------------------•
An IMPORTANT thing to note is that the generic regex :
Regex A
(*SKIP)(*F)|
Regex B is strictly identical to Regex B, when the Regex A CANNOT match anything !Best Regards,
guy038
-
-
Hi, @fern99, @terry-r, @mkupper and All,
An other example :
To list all files which do NOT contain the strings
ABC
ANDJKL
ANDXYZ
, in upper case, from your opened documents, use :-
FIND
(?s-i)(?=\A.*(?:ABC|JKL|XYZ))(*SKIP)(*F)|\A.
-
Check the
Wrap around
option -
Click on the
Find All in All Opened Documents
button
REMARK : Do keep the LAST regex dot (
.
), after\A
. If omitted, consecutive searches of this regex does *NOT give the same results ! ( Bug ? )BR
guy038
-