Bug when a multi-lines regex is used in the 'Search', 'Replace' or 'Mark' dialog
-
Hello, All,
I suppose that this bug was introduced when the
Minimum size for checking "In Selection"value was added inPreferences > Searching > When Find Dialog is Invoked, in thev8.5.8release ( point#12)- Do a stream selection of the multi-lines regex below :
(?x-i) # Search SENSIBLE to the CASE (?<=\x20) # Preceded with a SPACE char (?: # BEGINNING of the NON-CAPTURING group LETTER | CAPITAL | SMALL | DIGIT | FRACTION | LIGATURE | SUPERSCRIPT | SUBSCRIPT | CIRCLED | PARENTHESIZED | MATHEMATICAL | FULL[ ]STOP | ROMAN | EPACT ) # END of the NON-CAPTURING group (?=\x20) # Followed with a SPACE char-
Open the
Markdialog (Ctrl + M) -
Select the two options
Purge for each searchandWrap around, only -
Choose the
Regular expressionmode -
Click on the
Mark Allbutton
Against the following text, you get the message
47 matches in entire file| 0030 | DIGIT ZERO | 0 | 0370 | GREEK CAPITAL LETTER HETA | Ͱ | 0400 | CYRILLIC CAPITAL LETTER IE WITH GRAVE | Ѐ | 0500 | CYRILLIC CAPITAL LETTER KOMI DE | Ԁ | 10A0 | GEORGIAN CAPITAL LETTER AN | Ⴀ | 1C80 | CYRILLIC SMALL LETTER ROUNDED VE | ᲀ | 1D00 | LATIN LETTER SMALL CAPITAL A | ᴀ | 1E00 | LATIN CAPITAL LETTER A WITH RING BELOW | Ḁ | 2070 | SUPERSCRIPT ZERO | ⁰ | 2C60 | LATIN CAPITAL LETTER L WITH DOUBLE BAR | Ⱡ | 2D00 | GEORGIAN SMALL LETTER AN | ⴀ | A640 | CYRILLIC CAPITAL LETTER ZEMLYA | Ꙁ | A722 | LATIN CAPITAL LETTER EGYPTOLOGICAL ALEF | Ꜣ | AB30 | LATIN SMALL LETTER BARRED ALPHA | ꬰ | FB00 | LATIN SMALL LIGATURE FF | ff | FF10 | FULLWIDTH DIGIT ZERO | 0 | 102E1 | COPTIC EPACT DIGIT ONE | 𐋡 | 10500 | ELBASAN LETTER A | 𐔀 | 10780 | MODIFIER LETTER SMALL CAPITAL AA | 𐞀 | 1CCD6 | OUTLINED LATIN CAPITAL LETTER A | | 1D400 | MATHEMATICAL BOLD CAPITAL A | 𝐀 | 1DF00 | LATIN SMALL LETTER FENG DIGRAPH WITH TRILL | 𝼀 | 1E030 | MODIFIER LETTER CYRILLIC SMALL A | 𞀰 | 1F100 | DIGIT ZERO FULL STOP | 🄀 | 1FBF0 | SEGMENTED DIGIT ZERO | 🯰- Again, do a stream selection of the multi-lines regex, below
(?x-i) # Search SENSIBLE to CASE (?<=\x20) # Preceded with SPACE (?: # Start NON-CAPTURING group 0[0-2][0-9A-F][0-9A-F] | 03[7-9A-F][0-9A-F] | 04[0-9A-F][0-9A-F] | 05[0-8][0-9A-F] | 10[A-F][0-9A-F] | 1C[8-B][0-9A-F] | 1D[0-9AB][0-9A-F] | 1[EF][0-9A-F][0-9A-F] | 20[7-C][0-9A-F] | 21[0-8][0-9A-F] | 24[6-9A-F][0-9A-F] | 25[A-F][0-9A-F] | 27[0-B][0-9A-F] | 2C[6-9A-F][0-9A-F] | 2D[012][0-9A-F] | A6[4-9][0-9A-F] | A7[2-9A-F][0-9A-F] | AB[3-6][0-9A-F] | FB[01][0-9A-F] | FF[0-5E][0-9A-F] | 102[EF][0-9A-F] | 105[0-2][0-9A-F] | 107[89AB][0-9A-F] | 1CC[DEF][0-9A-F] | 1D[4-7][0-9A-F][0-9A-F] | 1DF[0-9A-F][0-9A-F] | 1E0[3-8][0-9A-F] | 1F1[0-9A-F][0-9A-F] | 1FB[0-9A-F][0-9A-F] ) # End NON-CAPTURING group (?=\x20) # Followed with SPACE # END MULTI-lines RegexNote, in the status bar, that
1,047characters have been selected- Open the
Markdialog (Ctrl + M)
Note that the last part of the combo box does NOT show the text
# END MULTI-lines Regexbut shows the previous text# Followed with a SPACE char-
Select the two options
Purge for each searchandWrap around, only -
Choose the
Regular expressionmode -
Uncheck the
In selectionoption -
Click on the
Mark Allbutton
The previous search is RE-run and the same matches occurred !
Now, RE-select the second multi-lines regex, without including the last line
# END MULTI-lines RegexNote that, this time, the indication
1,022characters, is shown in the status bar- Open the
Markdialog (Ctrl + M)
Note that, this time, the last part of the combo box show the text
# Followed with SPACEand NOT the previous text# Followed with a SPACE char. So, this second regex seems correctly taken in account !-
Select the two options
Purge for each searchandWrap around, only -
Choose the
Regular expressionmode -
Click on the
Mark Allbutton
=> This time, the marked text is, as expected, all the hexadecimal values beginning each line and the message said
25 matches in entire file! The process is OK because it’s under the limit of1,024bytes.To my mind, It would be best to increase the
1,024value, to automatically check theIn selectionoption, to2,048, which, roughly, corresponds to the maximum of characters that the Find dialog may contain !Best Regards,
guy038
P.S. : I came across this bug when preparing my post about the new
Locale Orderfeature ! -
@guy038, I decided to try a different test.
I had three lines of text
aaa... repeated 1024 times. bbb... repeated 1025 times. ccc... repeated 32768 times.- No text is selected.
- Position the caret on the first line and do
Ctrl+F. The Find dialog will pop up with the Find field populated withaaa...aaa. - Close the dialog box and try again on line 2. The Find dialog pops up again but the Find field stil has
aaa...aaaand not the expectedbbb...bbb
The behavior changes from 1024 to 1025 characters in the selection. It’s a sort of well known issue and for a multi-line
(?x)free-form search is painful as you can’t get around the 1024 character limit by copy/pasting the text into the search or find field.While the magic number is 1024 this is apparently unrelated to the magic number 1024 found in
Settings / Preferences / Searching (tab)setting forMinimum Size for Auto-Checking "In-selection".As you can get around the 1024 limit by copy/pasting into the Find field I used the
ccc...cccline to test this. I loaded line 3 into the copy/paste buffer and then did aCtrl+Fto bring up the search dialog and then pasted into the Find field.I discovered that the Find field is limited to 2046 characters.
- Do a search for
ccc...cccand you will discover that it matches and selects 2046 characters from line 3. It does not matter if you use Normal, Extended, or Regular Expression mode. - If you re-activate the Find dialog you will discover that the Find field has 2046 characters in it.
There is another upper limit which is that the Find field allows for up to 30,000 characters. You can’t paste more than 30,000 characters into the field. FWIW, Microsoft Notepad’s Find field is limited to 127 characters.
If you put in a feature request then I’m inclined to vote for than an upper limit of 30,000 characters in the selection will auto-populate the Find field. Hopefully, both the normal and extended search allow for 30,000 character searches.
We know that regular expression mode has a much lower limit but I don’t see a clean way to impose a smaller limit for that mode while allowing for switching modes within the dialog box.
-
Hello, @mkupper an All,
Really sorry, but I’m rather confused !
@mkupper, you said :
I discovered that the Find field is limited to
2046characters.And two lines below, you said :
There is another upper limit which is that the Find field allows for up to
30,000charactersTo my mind, the former number is correct, No ??!!
I also verified, on my old
Win-XPlaptop, with the last XP version of N++ (v7.9.2) :-
That it is possible to get a multi-lines regex up to the
2,046characters -
That the automatic check of the
In selectionoption, although NOT configurable in thePreferences...dialog, at that time, is effective for the1,025or upper values
So, at the time of the
v7.9.2release, the regex limit of chars and the automaticIn selectionlimit seemed unrelated ! Not sure that it’s still the case, nowadays ?Best Regards,
guy038
-
-
Note, in the status bar, that 1,047 characters have been selected
Trying to follow along, I don’t see how the above happens.
-
Hi, @alan-kilborn, and All,
Oh, yes, sorry @alan-kilborn, It’s a typo : I use , generally, the
~~~string to define and end a text block. But, this time, I forgot one tilde for a block end :-((I edited my first post and correct it !
So, just retry and copy the third text section, of my initial post, in the clipboard with the upper-right corner button. It should be OK !
BR
guy038
-
@mkupper said:
There is another upper limit which is that the Find field allows for up to 30,000 characters. You can’t paste more than 30,000 characters into the field.
If you’re speaking slightly sloppily, then this makes sense. I’d guess that the limit is actually 32767, the default Windows value for an edit control, see HERE.
But isn’t it true that in Notepad++, even though you can put more than 2046 characters in the Find what box (e.g. via pasting), it ignores anything over 2046 when executing the search?
Side Note: Also, what Notepad++ uses as a limit might be 2046 bytes, not characters in a strict sense. I haven’t looked at this lately, but if memory serves if you use multibyte characters in the Find what box data, the limit is going to be less than 2046.
-
@Alan-Kilborn said in Bug when a multi-lines regex is used in the 'Search', 'Replace' or 'Mark' dialog:
If you’re speaking slightly sloppily, then this makes sense. I’d guess that the limit is actually 32767, the default Windows value for an edit control, see HERE .
I hope it was not that sloppy. Here are the repro steps for what I did yesterday though suspect including the repro details makes this a TL;DR style post.
-
I am running v8.8.1 though I don’t think the version matters much as all of this repro also worked in v8.7.9.
-
I used Excel and Notepad++ to construct some “rulers” that start with the length and have markers. The rulers are 1024, 1025, and 70000 characters long.
I also included a line with the word
randomwhich is a word that I use to pre-load theFind whatfield at times.random 1024____10________20________30________40________50________60________70________80________90_______100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990______1000______1010______10201024 1025____10________20________30________40________50________60________70________80________90_______100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990______1000______1010______1020_1025 (70000 character test string removed as the forum does not allow for more than 16384 character long posts)At times I’ll say to preload the random word into the
Find whatfield. By this I mean:- Select or put the caret on the word
random(or anything of your choosing). Ctrl-Fto bring up the Find dialog box.- See that the desired word is in the
Find whatfield - Press
Escto close the Find dialog box. - Move the caret to a blank area of the document. (that’s why I have a blank like above and below the word
randomin the test data.) - You may do another
Ctrl-Fto bring up the Find dialog box andFind whatfield should have the random word in it.
** The 30000 character
Find whatfield limit **- Preload the random word into the
Find whatfield. - Load the 70000 character ruler (without the end of line) into the copy/paste buffer.
- Move the caret to a blank area so that it’s not sitting on the ruler or some other word.
Ctrl+Fto bring up the Find dialog box and thenCtrl+Vto paste the ruler into theFind whatfield.- You should see that the
Find whatfield starts with70000___10...and ends with..._____29980_____29990_____30000 Ctrl+Ato select all of theFind whatfield contents,Ctrl+Cto load that into the copy/paste buffer,Escto close the Find dialog box, andCtrl+Vto paste the results into the Notepad++ document.- You should see a 30000 character long line that starts with
70000___10and ends with_____30000.
** The 1024 character automatic select and load into
Find whatlimit **This is the item that started this forum thread.
- Preload the random word into the
Find whatfield. - Put the caret on the 1025 character ruler and then do
Ctrl-Fto see what’s inFind what. You will see it’s still the pre-loaded random word. - Try various things such as selecting the ruler and then doing
Ctrl+F. You will still get the pre-loaded random word. - Repeat steps 1, 2 and 3 using the 1024 character long ruler. You will discover that this ruler loads into the
Find whatfield.
** The 2046 character limit for Notepad++ searches **
- First do steps 1 to 5 of the The 30000 character
Find whatfield limit repo that’s above. - Do
Enter(or click Find Next) to search for whatever is in theFind whatfield and then pressEsc. - You will see that the first 2046 characters of the 70000 character ruler are selected. I did a
Ctrl+Cand pasted that to it’s own line to verify that it’s a 2046 character long line. - Do
F3and Notepad++'s search thing will continue to find/select the first 2046 characters of the 70000 character ruler. (You can make extra copies of this ruler if desired)
** Bonus on the 2046 character limit for Notepad++ searches **
I wondered if I could trick Notepad++ into using more than 2046 characters and so tried this:
- Exit Notepad++, edit the config.xml file, and added
2050______2060to the<Find name="70000___10________20line so that it ends with2040______2050______2060" /> - I started Notepad++, went to a blank area, and did
Ctrl+F. I discovered thatFind whatis pre-loaded with a 2060 long value that starts with70000___10________20and ends with2040______2050______2060. - Searches though are still limited to 2046 characters.
** Bonus on the
F3search **I discovered that Notepad++ must have a separate internal buffer that it uses for the
F3search. If you start Notepad++ and then doF3then nothing happens even though the top of the find history is something that should be in the document.Related to this is if you preload something into the find history that it’s not available for an
F3search. For example, put the caret on the wordNotepad, doCtrl-Fand thenEsc. TappingF3will not search forNotepadbut instead it it searches for whatever you had last searched for.I ran into this as I was hoping to preload the 2060 character long string by editing config.xml, starting Notepad++, and then doing an
F3to see how long the resulting selection was. Nothing happened as there was noting in the 'text to search for` buffer. Thus I could not use this method to fool Notepad++ into searching for more than 2046 characters.Also, when either pre-load or copy/paste something into the
Find whatfield and then exit Notepad++ without ever searching for that value then it’s not saved to the config.xml file.- Thus, while you can copy/paste a 30000 character long string so that it shows up at the top of the search history this will not get saved to config.xml file.
- If you do a search for that 30000 character long string then it seems that it’s first truncated to 2046 characters and it then does the seach. I believe it truncates first as the
Find whatfield is truncated on the spot when you click the[Find Next] - Thus the dialog box does not let you search for more than 2046 characters.
- The truncated value will also now be at the top of the search history, and when you exit Notepad++ the 2046 character long string gets written to the config.xml file.
I did not do any testing with Notepad++ macros or PythonScript to see if I could use more than 2046 characters in a search pattern.
But isn’t it true that in Notepad++, even though you can put more than 2046 characters in the Find what box (e.g. via pasting), it ignores anything over 2046 when executing the search?
That seems to be true and it also truncates the field to 2046 characters when adding it to the search history.
Side Note: Also, what Notepad++ uses as a limit might be 2046 bytes, not characters in a strict sense. I haven’t looked at this lately, but if memory serves if you use multibyte characters in the Find what box data, the limit is going to be less than 2046.
That’s correct which is why I used plain ASCII for these tests. I’ve forgotten if the limits in this area are related to UTF-8 encoding and/or some characters need more bits than the seven needed for plain ASCII.
-
-
Follow up on the previous post as the forum software did not allow for a 70000 character long ruler style test string. I had used Excel.
10 =REPT("_",10-LEN(A1))&TEXT(A1,"0") =A1+10 =REPT("_",10-LEN(A2))&TEXT(A2,"0") =A2+10 =REPT("_",10-LEN(A3))&TEXT(A3,"0") =A3+10 =REPT("_",10-LEN(A4))&TEXT(A4,"0") ...repeat that for 7000 rows. Row 7000 has:
=A6999+10 =REPT("_",10-LEN(A7000))&TEXT(A7000,"0")with the result being:
10 ________10 20 ________20 30 ________30 ... 70000 _____70000I then copy/pasted column B into Notepad++,
verified 7000 lines, and then search/replace
to remove the\Rto generate:________10________20________30 ... _____70000 -
Hello, @mkupper, @alan-kilborn and All,
@mkupper, I repeated all your process and indeed, your method and explanations were very instructive !
I just do NOT understand one point, yet. You said, in a previous post :
Hopefully, both the normal and extended search allow for
30,000character searches.Well, repeating the points 1 to 5 of The 30000 characters
Find whatfield limit section, with the FIND dialog pre-configured in Normal or Extended mode, it just matches the first2,046characters of the 70000___10 string, although the Find field do contain the first30,000chars of the 70000___10 string ?!Best regards,
guy038
-
@mkupper said:
** The 1024 character automatic select and load into Find what limit **
This is the item that started this forum thread.As this IS what started the thread, I’d like to concentrate on that specific aspect.
While the magic number is 1024 this is apparently unrelated to the magic number 1024 found in Settings / Preferences / Searching (tab) setting for Minimum Size for Auto-Checking “In-selection”.
To evaluate the “apparently unrelated” part, I started changing the value for Minimum Size for Auto-Checking “In-selection”. And, unless I missed something, it does actually appear related. For example, creating 1022 and 1023 “ruler” lines and then changing the number in the box to 1023 yields what I’d expect (based on the 1024 behavior): That is, invoking Ctrl+f with the caret in the 1023 ruler checkmarks In selection, while invoking it with the caret in the 1022 ruler uncheckmarks it.
Thus I don’t think that 1024 is a “magic number”; it’s a default setting value, with no magic.
But I still feel like I’m missing something about the point @mkupper was trying to make about this.
-
30000 vs. 32767
I’ve no idea where 30000 originates. A quick search of the Notepad++ source code won’t find it literally. So… apologies to @mkupper about the “sloppily” thing; I did think you were speaking in “ballpark” terms.
-
Instead of Excel, why not use a bit of PythonScript to generate the “ruler” lines?:
accum = '' for j in range(1020, 1030 + 1): desired_len = j des_len_as_str = str(desired_len) s = des_len_as_str tens_count = 0 while True: if (len(s) + 1) % 10 == 0: if (tens_count + 2) * 10 <= desired_len: s += str((tens_count + 1) * 10) tens_count += 1 if len(s) >= desired_len: break s += '_' s = s[:-len(des_len_as_str)] + des_len_as_str accum += s + '\r\n' editor.copyText(accum)The example above generates ruler lines of length
1020through1030, inclusive. The ruler data ends up in the clipboard after the script runs.Note that mine might be different from the earlier ruler lines discussed – I chose that the intermediate numbers start in their indicated column, e.g. after you paste the output of the script into a new tab, if you put the caret just to the left of the
8in890, the status bar will indicateCol: 890.To select 890 characters from that same example line, put the caret between the
8and the9and then press Shift+Home.Here’s some output from the script:
1020_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010___1020 1021_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010____1021 1022_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010_____1022 1023_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010______1023 1024_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010_______1024 1025_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010________1025 1026_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010_________1026 1027_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010__________1027 1028_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010___________1028 1029_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010____________1029 1030_____10________20________30________40________50________60________70________80________90________100_______110_______120_______130_______140_______150_______160_______170_______180_______190_______200_______210_______220_______230_______240_______250_______260_______270_______280_______290_______300_______310_______320_______330_______340_______350_______360_______370_______380_______390_______400_______410_______420_______430_______440_______450_______460_______470_______480_______490_______500_______510_______520_______530_______540_______550_______560_______570_______580_______590_______600_______610_______620_______630_______640_______650_______660_______670_______680_______690_______700_______710_______720_______730_______740_______750_______760_______770_______780_______790_______800_______810_______820_______830_______840_______850_______860_______870_______880_______890_______900_______910_______920_______930_______940_______950_______960_______970_______980_______990_______1000______1010______1020___1030 -
Hello, @mkupper, @alan-kilborn and All,
I did some tests and I succeeded to find a work-around in order to use multi-lines regexes of more than
1,024characters, as long as, of course, the total amount of chars does not exceed2,047characters. All the tests were done with the last N++ releasev8.8.1.Here my method, to be followed rigorously.
In a new tab, paste, successively, the two MULTI-lines regexes, below :
(?x-i) # Search SENSIBLE to CASE (?<=\x20) # Preceded with SPACE (?: # Start NON-CAPTURING group 0[0-2][0-9A-F][0-9A-F] | 03[7-9A-F][0-9A-F] | 04[0-9A-F][0-9A-F] | 05[0-8][0-9A-F] | 10[A-F][0-9A-F] | 1C[8-B][0-9A-F] | 1D[0-9AB][0-9A-F] | 1[EF][0-9A-F][0-9A-F] | 20[7-C][0-9A-F] | 21[0-8][0-9A-F] | 24[6-9A-F][0-9A-F] | 25[A-F][0-9A-F] | 27[0-B][0-9A-F] | 2C[6-9A-F][0-9A-F] | 2D[012][0-9A-F] | A6[4-9][0-9A-F] | A7[2-9A-F][0-9A-F] | AB[3-6][0-9A-F] | FB[01][0-9A-F] | FF[0-5E][0-9A-F] | 102[EF][0-9A-F] | 105[0-2][0-9A-F] | 107[89AB][0-9A-F] | 1CC[DEF][0-9A-F] | 1D[4-7][0-9A-F][0-9A-F] | 1DF[0-9A-F][0-9A-F] | 1E0[3-8][0-9A-F] | 1F1[0-9A-F][0-9A-F] | 1FB[0-9A-F][0-9A-F] ) # End NON-CAPTURING group (?=\x20) # Followed with SPACEAnd :
(?x-si) (?: ^ ~~~ [\h\l]* \R (?s:.+?) ^ ~~~ \h* \R | ^ ``` [\h\l]* \R (?s:.+?) ^ ``` \h* \R | ^ \h* (?: - \h* )? (?i: FIND | SEARCH | REPLACE ) \x20 .+ | (?<= \s ) (?: # Case *......* / Case **......** / Case `......` \* [^`* \t\r\n] (?: [^*\r\n]* [^`* \t\r\n] )? \* | \*\* [^`* \t\r\n] (?: [^*\r\n]* [^`* \t\r\n] )? \*\* | ` [^` \t\r\n] (?: [^\r\n]* [^` \t\r\n] )? ` | # Case **`......`** \*\* ` [^` \t\r\n] (?: [^`\r\n]* [^` \t\r\n] )? ` \*\* | \*\* ` [^`\r\n] (?: [^`\r\n]* [^`\r\n] )? ` (?: [^*`\r\n] [^`\r\n]* )? [^` \t\r\n] \*\* | \*\* [^` \t\r\n] (?: [^`\r\n]* [^*`\r\n] )? ` [^`\r\n] (?: [^`\r\n]* [^`\r\n] )? ` \*\* | \*\* [^` \t\r\n] (?: [^`\r\n]* [^*`\r\n] )? ` [^`\r\n] (?: [^`\r\n]* [^`\r\n] )? ` (?: [^*`\r\n] [^`\r\n]* )? [^` \t\r\n] \*\* | # Case *`......`* \* ` [^` \t\r\n] (?: [^`\r\n]* [^` \t\r\n] )? ` \* | \* ` [^`\r\n] (?: [^`\r\n]* [^`\r\n] )? ` (?: [^*`\r\n] [^*`\r\n]* )? [^` \t\r\n] \* | \* [^` \t\r\n] (?: [^*`\r\n]* [^`*\r\n] )? ` [^`\r\n] (?: [^`\r\n]* [^`\r\n] )? ` \* | \* [^` \t\r\n] (?: [^*`\r\n]* [^`*\r\n] )? ` [^`\r\n] (?: [^`\r\n]* [^`\r\n] )? ` (?: [^*`\r\n] [^*`\r\n]* )? [^` \t\r\n] \* ){1}+ (?= [\s,;.-] | 's | \z ) ) (*SKIP) (*F) # CORRECT cases are IGNORED | [*`]+ .+? [*`]+ | [*`]+ \x20* .+? \x20+Note that the first regex contains
1,022characters and the second contains1,802chars, so one regex are less than1,024characters and the other more than1,024.Add, in this new tab, the two lines below ( the first line should be matched by the first regex and the second line should be matched by the second regex ).
2C63 | LATIN CAPITAL LETTER P WITH STROKE | Ᵽ *AB D *- To end, save this new tab as
Test_RE.txt
First manipulation :
-
Switch to the
Test_RE.txtfile -
Select the contents of the first regex (
1,022chars ) -
Open the Find dialog (
Ctrl + F)
=> As you can see, the end of this multi-lines regex is visible and you should note the part
# Followed with SPACE-
Uncheck all the box options if any ( note that the
In selectionoption was already not checked ) -
Note that the
Wrap aroundoption will stay uncheched during all the tests -
Select the
Regular expressionsearch mode -
Click on the
Find Nextbutton
=> As expected, it matches the
2C63string- Close the Find dialog
Second manipulation :
-
Switch to the
Test_RE.txtfile -
This time, select the contents of the second regex (
1,802chars ) -
Open the Find dialog (
Ctrl + F)
=> You can observe two things :
-
The
In selectionoption is checked ( Logical, as the amount of chars is greater than1,024) -
Surprisingly, the regex, shown in the search field, is STILL the first regex and not the second expected multi-lines regex !
-
Hit the
Find Nextbutton : as expected, it returns, again, the2C63string ( Note that theIn selectionbutton is NOT concerned when you hit theFind Nextbutton ) -
Close the Find dialog
At this point, you can, either :
-
Re-open the Find dialog
-
Close and re-load the
Test_RE.txtfile -
Close and re-start Notepad++
Whatever you decided, if you re-open the Find dialog, the first regex, ending with the string
# Followed with SPACE, is STILL present in the search field, although we previously select the second regex ???!!! Why ?
Third manipulation :
-
Switch to the
Test_RE.txtfile -
Select, again, the contents of the second regex (
1,802chars ) -
Before opening the Find dialog, just hit the
Ctrl + F3shortcut ( This is the WORK-AROUND ! ) -
Open the Find dialog
-
This time, you can verify that the search field contains the correct regex : the second one
-
Hit the
Find Nextbutton => This time, it correctly matches the*AB D *string
Note that sometimes, I needed to cancel the selection, right before opening the Find dialog, in order to get this match !
It’s important to add that, modifying in
Preferences... > Searching > When Fin Dialog is Invokedthe value1,024to the value0, does NOT change the global behaviorAs a conclusion, I would say that all that logic seems rather unclear. Can anyone reproduce these steps and see the problem ?
Best Regards,
guy038
P.S. :
Note that without this work-around, I should have used the
v8.4.9release, which is the last version before the auto-checking of theIn selectionoption - To end, save this new tab as
-
@guy038 said:
Second manipulation :
…
…select the contents of the second regex ( 1,802 chars )
…
The In selection option is checked ( Logical, as the amount of chars is greater than 1,024 )
…
Surprisingly, the regex, shown in the search field, is STILL the first regex and not the second expected multi-lines regex !
…
Whatever you decided, if you re-open the Find dialog, the first regex, ending with the string # Followed with SPACE, is STILL present in the search field, although we previously select the second regex ???!!! Why ?The way it works (I think) is that if In selection is going to become checkmarked due to the N++ code doing it, i.e., if the number of bytes in the selected text is greater-than-or-equal-to¹ the setting “Minimum Size for Auto-Checking ‘In selection’”, then the selected text is NOT supposed to be copied to Find what, regardless of the setting for Fill Find Field with Selected Text.
Why not?
Well, if the code has determined that you are going to be searching within the selected text, copying the selected text to Find what doesn’t make sense. You already know how many matches that would generate (exactly one). So, it is waiting for you to put something different in Find what.
¹ : In attempting to verify this before posting, I found out that it has to be greater-than, not greater-than-or-equal-to. Thus, for the default case of 1024, if 1024 bytes are selected when Ctrl+f is invoked, the checkbox will become checkmarked AND the selected text will be copied to Find what. This seems wrong to me; it should be as I first stated, greater-than-or-equal-to.
Third manipulation :
…
Select, again, the contents of the second regex ( 1,802 chars )
Before opening the Find dialog, just hit the Ctrl + F3 shortcut ( This is the WORK-AROUND ! )Ctrl+F3 is Select and Find Next which is wholly different from a “select” followed by a Find Next. It’s not affected by, nor constrained by, the In selection setting.
It’s important to add that, modifying in Preferences… > Searching > When Find Dialog is Invoked the value 1,024 to the value 0, does NOT change the global behavior
I’m unclear on what you mean by this.
-
Hi, @mkupper, @alan-kilborn and All,
Sorry, I was out these last three hours !
When I said :
It’s important to add that, modifying in
Preferences... > Searching > When Fin Dialog is Invokedthe value1,024to the value0, does NOT change the global behaviorI meant that my three manipulations produce the same results if you previously chose the
1,024value or the0value : NO change in behavior.BR
guy038
Thinking about it, I don’t know it this is judicious. If I choose the
zerovalue, as auto-checking is disabled, if I select an important amount of text and immediately invoke the Find dialog, it should fill up theFind whatfield up to2,046characters ! -
@guy038 said:
If I choose the zero value, as auto-checking is disabled, if I select an important amount of text and immediately invoke the Find dialog, it should fill up the Find what field up to 2,046 characters !
I’d say that that sounds reasonable.
-
@guy038, @Alan-Kilborn, and others
Congratulations on discovering that Ctrl+F3 trick to bypass the 1024 character selection to the Find-what field limit.
That prompted me to do a test.
- I created a 5000 character long ruler line with
________10________20________30…______4980______4990______5000. - I duplicated that line a few times.
- I put the caret on one of the lines and did
Ctrl+F3. Notepad++ immediately jumped to the next line and there’s a bunch of text selected. - I did
Ctrl+Cto load the selected text into the copy/paste buffer, and pasted to a blank line in the area below my list of 5000 character rulers. - I saw that the new line is 2047 characters long and runs from
________10________20________30…______2030______2040______2
My first thought was, “wait, I thought the search limit was 2046 characters. Apparently the quick search thing using
Ctrl+F3will search for up to 2047 character patterns.”I verified that a reverse quick search using
Ctrl+Shift+F3also allows or up to 2047 character patterns.- I moved the caret to a blank line (it can be any blank area) and did
Ctrl+Fto activate the normal find dialog. - I see that
Find whatis populated and that it has 2027 characters in theFind whatfield. - I click
[Find Next]which selects some stuff,Escto close the dialog,Ctrl+Cto load the text I found into the copy/paste buffer, and paste that to a new line.
The search using
[Find Next]matched the first 2046 characters that were in theFind-whatfield.Anyway, that’s excellent that we can have regular expressions that are up to 2046 characters long and to get them into the
Find-whatfield without using copy/paste. The procedure will be to:- Select the long regexp
Crtl+F3orShift+Crtl+F3- Move the caret to a blank area so that the caret is not within nor touching a word.
Crtl+Fand theFind-whatfield will be populated.
This will be handy as something I frequently do is to load the replacement string into the copy/paste buffer, select the search pattern,
Ctrl+H, tab down to theReplace withfield, andCtrl+Vto fill in the replacement pattern.That works for search patterns for up to 1024 characters and now with @guy038’s
Ctrl+F3trick I can get around the 1024 character limit and don’t need to use the copy/paste buffer to do that.A few weeks ago I took a look at the Notepad++ source code to better understand the 1024 and 2046 character limits.
-
The 1024 character limit comes from the default value for the
Minimum size for auto-checking in-selectionsetting. It’s a Notepad++ bug as that setting is unrelated to the the limit for auto-loading the selection into theFind whatfield. Unfortunately, the fix is not easy as the internal constant that has the 1024 is used in several different way. -
The 2046 character limit seems to be either a different Notepad++ bug or a Scintilla limit or bug. The buffers that hold search and replace patters are 2028 16-bit characters long. The pattern is
NULterminated meaning we should be able to have up to 2047 character long patterns. The code has an extra subtraction somewhere that causes 2047 to be 2046. The extra subtract seems to be buried in the Notepad++ logic that’s dealing with Scintilla.
While looking at the 2046 character issue I saw that the pattern buffer uses 16-bit wide characters. I verified that you can search for up to 2046 16-bit characters such as
⛱⛱⛱...⛱⛱⛱(U+26F1). If you search for an extended Unicode character such as🦎🦎🦎...🦎🦎🦎(U+1F98E or the surrogate pair\x{D83E}\x{DD8E}) then you will be limited to 1023 characters.I did discover some weirdness with
Ctrl+F3.- Using
Ctrl+F3when the caret is within a short string such as⛱⛱⛱⛱⛱⛱⛱⛱seems to do nothing. The text is not loaded into theFind whatfield. I suspect that⛱is not a word character. ❽(U+277D) is a word character butCtrl+F3gets confused by❽❽❽...❽❽❽as it seems to be selecting and searching for the entire line. I was able to do 5000 character search matches using long strings of❽❽❽...❽❽❽. As I knew the buffers are 2024 words long I suspected there was a buffer overflow.
- I created a 5000 character long ruler line with
-
@mkupper said:
The 1024 character limit comes from the default value for the Minimum size for auto-checking in-selection setting. It’s a Notepad++ bug as that setting is unrelated to the the limit for auto-loading the selection into the Find what field
It isn’t really a bug, it’s more of a historical vestige. Before the setting existed, 1024 was used for even more purposes. Think of the setting as always-existing, but at an unchangeable value: 1024.
-
@Alan-Kilborn said in Bug when a multi-lines regex is used in the 'Search', 'Replace' or 'Mark' dialog:
It isn’t really a bug, it’s more of a historical vestige. Before the setting existed, 1024 was used for even more purposes. Think of the setting as always-existing, but at an unchangeable value: 1024.
Agreed but it’s not quite that bad at present. There is a constant within the npp source code that defines both the maximum value and default value for the
Settings / Preferences / Searching / Minimum Size for Auto-Checking "In selection". It defaults to 1024. The length of the current selection is visible in theSel:part of Notepad++'s status line. If the number is 0 to 1,023 and you doCtrl+F,Ctrl+H, orCtrl+Mto bring up the Find, Replace, or Mark dialog boxes then theIn Selectionfield will not be enabled. If the number is 1024 or larger and you doCtrl+F,Ctrl+H, orCtrl+MthenIn Selectionfield will be enabled. You can change this threshold via Settings / Preferences.That works well.
The same internal constant that defines the default and/or maximum value for the
Minimum Size for Auto-Checking "In selection"thing I just mentioned is also used by the code for the Find, Replace, and Mark dialog boxes to decide if the current selection should auto-populate theFind whatfield. If the current selection is from 1 to 1024 characters thenFind whatgets populated with whatever is in the selection. If the current selection is zero or is more than 1024 characters then the selection is ignored andFind whatcontains whatever was in there before.The preferences setting for
Minimum Size for Auto-Checking "In selection"does not control the current selection auto-populatesFind whatthing. Auto-population is a constant and is 1024.A few weeks ago I wrote up some notes to myself about the name of this internal constant and started teasing out how and where the constant gets used. I can’t find my notes at the moment. My plan at the time was to submit a feature request on github that adds a new constant and showed exactly how and where in the code it should be used so that we can separate out the current selection to
In selectionvs current selection toFind what. I’d still like to do that but at the time I realized the npp code is a marvelously tangled ball of yarn and so needed to move carefully with my nip-n-tuck.I also realized I probably should work on being able to compile my own copies if Notepad++.exe as there were areas where the current values of some internal variables are not obvious. I wanted to change the current selection to
Find whatlimit from 1024 to 2047 characters and to do that should also fix whatever causes Notepad++'s 2046 character limit. Nearly all of Notepad++'s code thinks the limit for search patterns is 2047 characters but something in there restricts searches to 2046 characters. -
Hello, @mkupper, @alan-kilborn and All,
Here is an other example where the search limits (
1,024and2,046) prevent us for correct searching of a multi-lines regex !Follow the link below and you’ll get the general multi-lines regex of an
URI( Uniform Ressource Identifier )https://jmrware.com/articles/2009/uri_regexp/URI_regex.html#uri-43
We get this section :
# RFC-3986 URI component: URI-reference (?: # ( [A-Za-z][A-Za-z0-9+\-.]* : # URI (?: // (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]) \.){3} (?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | / (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | ) (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? | (?: // # / relative-ref (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | / (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | ) (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? ) # )which is a very long multi-lines regex, of size
3,991characters. Thus, not searchable with Notepad++ !
You could say, if this multi-lines regex is changed into a single-line regex, may be it’ll be OK ?
First, let’s find out a way to transform any MULTI-line regex in a SINGLE-line equivalent regex :
-
Open the Replace dialog (
Ctrl + H) -
Uncheck all the box options
-
Check the
In selectionoption ( IMPORTANT ) -
FIND
(?x-s) (?: \[ [^\x5B\x5D\r\n]+ \] | \\ [ #] | \\x20 | \\x23 ) (*SKIP) (*F) | \x20* [#] .* (?: \R | \z ) | \x20+ | \R -
REPLACE
Leave EMPTY -
Select the
Regular expressionsearch mode -
Now, do a stream selection of all the lines of the MULTI-lines regex which must be shortened
-
To end, click once only, on the
Replace Allbutton
=> You get the expected SINGLE-line regex, still selected
If we apply the above S/R to our regex example, it returns the following single-line regex :
(?:[A-Za-z][A-Za-z0-9+\-.]*:(?://(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})*@)?(?:\[(?:(?:(?:(?:[0-9A-Fa-f]{1,4}:){6}|::(?:[0-9A-Fa-f]{1,4}:){5}|(?:[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}|(?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}|(?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:|(?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::)(?:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]))|(?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)|[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+)\]|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*)(?::[0-9]*)?(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|/(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)?|(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|)(?:\?(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?(?:\#(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?|(?://(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})*@)?(?:\[(?:(?:(?:(?:[0-9A-Fa-f]{1,4}:){6}|::(?:[0-9A-Fa-f]{1,4}:){5}|(?:[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3}|(?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2}|(?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:|(?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::)(?:[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?))|(?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)|[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+)\]|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*)(?::[0-9]*)?(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|/(?:(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*)?|(?:[A-Za-z0-9\-._~!$&'()*+,;=@]|%[0-9A-Fa-f]{2})+(?:/(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*|)(?:\?(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?(?:\#(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*)?)Which is still a very long regex of size
2609characters ! So, no chance : even with the contracted form of the regex, it’s still over2,046characters and not searchable with Notepad++ :-((
By the way, the site’s ability to highlight any sub-section of the regex in green is really awesome !
Best Regards,
guy038
-