Search++: A work in progress

guy038

Well, I’m 74. Now, with my glasses I have nearly 20/20 vision in my left eye but only 3/10 in my right eye, due to a vascular problem in the retina that I had about 10 years ago :-((

No, I don’t have an high-dpi monitor and here is a snapshot of the Replace button, in exact size :

f556a661-b28b-4549-9781-c74cd0f30fc2-Capture d'écran 2026-04-12 202415.png

Of course, if I look at the button, I’m able to notice the mark →❙, after the word Replace, but I may miss it sometimes !

Thank you, to point out the specificities and differences about the character classes regarding Columns ++ and Search ++,

So, here is a summary on this topic :

•==============================•=============•================•=========================================================================•
|               Regex          |  Columns++  | Search++ Regex |                              Search++ ICU                               |
•==============================•=============•================•==============•==========================================================•
|  (?-i)\l                     |    2,283    |      2,595     |         1    |  Letter l                                                |
|  (?-i)[[:lower:]]            |    2,283    |      2,595     |     2,595    |  = (?-i)\p{Ll} + (?-i)\p{Other Lowercase} = 2,283 + 312  |
|                              |             |                |              |                                                          |
|  (?-i)\p{Ll}                 |    2,283    |      2,283     |     2,283    |                                                          |
|  (?-i)[[:lowercase letter:]] |    2,283    |      2,283     |     2,283    |                                                          |
|  (?-i)[[:Ll:]]               |    2,283    |      2,283     |     2,283    |                                                          |
•------------------------------•-------------•----------------•--------------•----------------------------------------------------------•
|  (?-i)\u                     |    1,886    |      2,006     |    Invalid   |  ???                                                     |
|  (?-i)[[:upper:]]            |    1,886    |      2,006     |     2,006    |  = (?-i)\p{Lu} + (?-i)\p{Other Uppercase} = 1,886 + 120  |
|                              |             |                |              |                                                          |
|  (?-i)\p{Lu}                 |    1,886    |      1,886     |     1,886    |                                                          |
|  (?-i)[[:Uppercase letter:]] |    1,886    |      1,886     |     1,886    |                                                          |
|  (?-i)[[:Lu:]]               |    1,886    |      1,886     |     1,886    |                                                          |
•==============================•=============•================•==============•==========================================================•
|  (?i)\l                      |    2,283    |      2,595     |         2    |  Letters L and l                                         |
|  (?i)[[:lower:]]             |    2,283    |      2,595     |     4,082    |                                                          |
|                              |             |                |              |                                                          |
|  (?i)\p{Ll}                  |    2,283    |      2,283     |     3,729    |                                                          |
|  (?i)[[:lowercase letter:]]  |    2,283    |      2,283     |     3,729    |                                                          |
|  (?i)[[:Ll:]]                |    2,283    |      2,283     |     3,729    |                                                          |
•------------------------------•-------------•----------------•--------------•----------------------------------------------------------•
|  (?i)\u                      |    1,886    |      2,006     |    Invalid   |  ???                                                     |
|  (?i)[[:upper:]]             |    1,886    |      2,006     |     3,484    |                                                          |
|                              |             |                |              |                                                          |
|  (?i)\p{Lu}                  |    1,886    |      1,886     |     3,322    |                                                          |
|  (?i)[[:Uppercase letter:]]  |    1,886    |      1,886     |     3,322    |                                                          |
|  (?i)[[:Lu:]]                |    1,886    |      1,886     |     3,322    |                                                          |
•==============================•=============•================•==============•==========================================================•

Note that all the properties, below, are used internally by the Unicode Cconsortium for generating other properties and are not intended to be used stand-alone. These properties only contribute to real properties so there’s no direct support for these properties in ICU and they all return the Invalid regular expression message !

\p{Jamo_Short_Name}                      \p{JSN}
\p{Other Alphabetic}                      \p{OAlpha} 
\p{Other Default Ignorable Code Point}    \p{ODI}    
\p{Other Grapheme Extend}                 \p{OGr Ext}
\p{Other ID Continue}                     \p{OIDC}   
\p{Other ID Start}                        \p{OIDS}   
\p{Other Lowercase}                       \p{OLower} 
\p{Other Math}                            \p{OMath}  
\p{Other Uppercase}                       \p{OUpper}

You can verify the number of characters, in these categories, from the files https://www.unicode.org/Public/UCD/latest/ucd/PropList.txt and https://www.unicode.org/Public/UCD/latest/ucd/Jamo.txt

Best regards

guy038

Coises

@guy038 said in Search++: A work in progress:

No, I don’t have an high-dpi monitor and here is a snapshot of the Replace button, in exact size :

The proportion is a bit different on my system:

so it seems the display does differ from system to system. I will find a way to make those icons more recognizable and obvious… I just don’t know when.

Lachlanmax

@Coises
It works great for me in this version.

I especially like the Remove Marks option and how well it works together with the bookmarks. That combination feels very natural in the workflow now.

For the moment I would not suggest any changes, it already feels quite solid for how I use it.
Thanks for the good work you did for my requests!

guy038

Hello, @coises and All,

I almost finished to study all the ICU syntax mainly focused on Unicode properties and, so far, all the results seem coherent ;-))

However, I noticed something strange about the property Numeric_Value

To this purpose, refer to https://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedNumericValues.txt

For any number which has a finite number of decimal places, like 0.1875 or 0.4, located after the equal sign, in the \p{Numeric_Value=....} syntax, the returned result is always correct. For example, \p{Numeric_Value=0.25} or \p{nv=0.25} returns 14 matches, as you may verify in the Unicode file

However, for any number which has an infinite number of decimal places, like 0.333333333333 or 0.142857142857, located after the equal sign, in the \p{Numeric_Value=....} syntax, the returned result is always 0 instead of an integer > 0

In addition, the special value \p{Numeric_Value=NaN} should return 323,567, against my Total_chars.txt file, instead of the value 0 !

Is there any chance that you truncated the decimal part in Search ++ or something like that ?

Best Regards,

guy038

Coises

@guy038 said in Search++: A work in progress:

Is there any chance that you truncated the decimal part in Search ++ or something like that ?

No. I did not mess with the ICU search at all; the string entered in the Find box is exactly what ICU’s regular expression engine gets.

I see that \p{nv=0.3333333333333333} (or any greater number of 3s) returns six matches. Likewise, \p{nv=0.66666666666666667} returns seven matches, but fewer 6s returns none.

Since the ICU4C function u_getNumericValue(UChar32 c) returns a double, I would guess that matching is dependent on the precise quirks of double-precision floating point format.

In addition, the special value \p{Numeric_Value=NaN} should return 323,567, against my Total_chars.txt file, instead of the value 0 !

There might not be anything you can enter that will be translated as Not-a-Number. I note that \p{Numeric_Type=None} does return 323,567 matches.

guy038

Hello, @coises and All,

Ah… OK. Thank you very much for your insight ! So, it seems that :

When all digits, after the decimal dot, are identical, you need to put, at least, 16 digits
When digits, after the decimal dot, may be different, you need to put, at least, 17 digits

Thus, this list of all these rational numbers :

    1/12     \p{Numeric_Value=0.08333333333333333}    or    \p{nv=0.08333333333333333}    =     1
    1/9      \p{Numeric_Value=0.1111111111111111}     or    \p{nv=0.1111111111111111}     =     1
    1/7      \p{Numeric_Value=0.14285714285714285}    or    \p{nv=0.14285714285714285}    =     1
    1/6      \p{Numeric_Value=0.16666666666666666}    or    \p{nv=0.16666666666666666}    =     4
    1/3      \p{Numeric_Value=0.3333333333333333}     or    \p{nv=0.3333333333333333}     =     6
    5/12     \p{Numeric_Value=0.41666666666666666}    or    \p{nv=0.41666666666666666}    =     1
    7/12     \p{Numeric_Value=0.58333333333333333}    or    \p{nv=0.58333333333333333}    =     1
    2/3      \p{Numeric_Value=0.6666666666666666}     or    \p{nv=0.6666666666666666}     =     7
    5/6      \p{Numeric_Value=0.83333333333333333}    or    \p{nv=0.83333333333333333}    =     3
    11/12    \p{Numeric_Value=0.91666666666666666}    or    \p{nv=0.91666666666666666}    =     1

Now, I didn’t even notice that the \p{Numeric_Type=None} does give that expected number, which added to all the other values, returns the total amount of characters of the Total_Chars.txt file which is 325 590 !

BR

guy038

Coises

Search++ version 0.5.4 is available:

Fix bookmarks and Show command not working with ICU search engine.
Fix unwanted control character inserted when focus is in the Find or Replace box and a keyboard shortcut is used to activate a Tools menu command that opens a dialog.
Use a custom font for button symbols. (Those are the symbols that change when you shift+click one of the drop-down menus to change the button click command, to remind you at a glance what the current command extent and scope are.) Hopefully these are easier to read on different systems. Feedback is encouraged if the button symbols are hard to read or do not look right. This is a first attempt. Changes will probably follow in subsequent releases.

@guy038:

This should fix bookmarks (and also the Show command) not working with ICU, and the unexpected insertion of control characters when using Ctrl+Shift+E and Ctrl+Shift+Y.

The big change here (what took me so long) is using a custom font for the symbols on the buttons. How to make that work was… not obvious. Let me know if they are easier to read—and if they are enough easier to read. There is a tough space constraint on those buttons, but I really want to avoid making them any larger if at all possible. This was my first time trying to make a font of any sort. I’m sure the designs of those symbols can be refined a bit (or a lot).

guy038

Hello, @coises, @Lachlanmax and All,

Waoou ! This new 0.54 version of Search++ is almost perfect ! I do hope that @Lachlanmax will have the same feeling than me, regarding the Dark Mode displaying, that I don’t use personally !

The Bookmarks and Show commands, in ICU mode, work correctly.
The possible insertion of control chars, within the Find and Replace has gone away !
The symbols, written on the different buttons, are much more intuitive and easily allow us to control what we’re doing. I particularly like the Open Documents scope and Documents in this view scope symbols !
As implemented in the previous version, when focus is on Search++, a Ctrl + J action toggles from Jump to next match to Do not jump to next match. But now, it’s really more obvious to get the difference between the two symbols when looking at right of the Replace button !

One remark :

For the Selection scope, the symbol does not really look like a true letter S, unlike the Marked Text scope, which clearly displays the symbol M !

Refer the snapshot, below, with the Whole document scope on left of Find button, the Selection symbol on left of the Count button and the Marked Text symbol on left of the Find all button :

d69cc2ad-57d1-4653-b810-e800fdda577a-Capture d'écran 2026-04-18 115133.png

The Selection scope seems less easy to identify , at first sight, isn’t it ?

May be, could you choose, for example, the 1F142 Unicode character which is the SQUARED LATIN CAPITAL LETTER S : 🅂 and, in the same way, the 1F13C character which is the he SQUARED LATIN CAPITAL LETTER M : 🄼 ?

To this purpose, refer to https://www.unicode.org/charts/PDF/U1F100.pdf

Now, a very simple bug to fix :

When the ICU is selected, if you try to do a simple Replace operation, Search++ displays the expected message Command not implemented and, of course, no replacement occurs.

Oddly, if you click on the Replace All button, the plugin displays the message Replaced xx matches in ... where xx represents the number of matches detected in current document ! But, luckily, no global replacement is performed, as well. I suppose that the identical message Command not implemented should be triggered, isn’t it ?

BTW, if the replacement process was allowed, in ICU mode, it seems that it would allow more than 9 back-references but would not accept any conditional replacement !

I also noted that the recursion feature is not allowed with the ICU regex engine !

Best Regards

guy038

P.S. :

I tried to double-click on the font file Search++-Private-Symbols.otf and I was able to recognize all the symbols used by your plugin !
In this version, in addition to the Search++-Private-Symbols.otf file, you also added a Search++.pdb file, which is quite large, indeed ! What is it used for ?

Coises

@guy038 said in Search++: A work in progress:

The symbols, written on the different buttons, are much more intuitive and easily allow us to control what we’re doing. I particularly like the Open Documents scope and Documents in this view scope symbols !

As implemented in the previous version, when focus is on Search++, a Ctrl + J action toggles from Jump to next match to Do not jump to next match. But now, it’s really more obvious to get the difference between the two symbols when looking at right of the Replace button !

Good! I was hoping these symbols would be more consistently easy to read across different systems. Thank you for confirming that it looks better on yours.

For the Selection scope, the symbol does not really look like a true letter S, unlike the Marked Text scope, which clearly displays the symbol M !

I see your point. I’m not happy with using letters instead of graphic representations, but so far I have not been successful at inventing “pictures” of selected and marked text that render well as characters in a font. I will keep trying and, failing anything better, at least make a more identifiable “S” for selection.

Now, a very simple bug to fix :

When the ICU is selected, if you try to do a simple Replace operation, Search++ displays the expected message Command not implemented and, of course, no replacement occurs.

Oddly, if you click on the Replace All button, the plugin displays the message Replaced xx matches in ... where xx represents the number of matches detected in current document ! But, luckily, no global replacement is performed, as well. I suppose that the identical message Command not implemented should be triggered, isn’t it ?

Yes, it should show “Command not implemented.” I’ll fix that.

In this version, in addition to the Search++-Private-Symbols.otf file, you also added a Search++.pdb file, which is quite large, indeed ! What is it used for ?

Visual Studio uses *.pdb files to support debugging. I included it in the x64 zip file by mistake. I’ve replaced the Search++‍-0.5.4-x64.zip artifact in the release with one that doesn’t contain that file. Thank you for catching that.

guy038

Hi, @coises and All,

Ah…, many thanks, in avance, for resolving bugs and trying to take into account my suggestions !

Regarding the Search++.pdb file, I can, of course, try to re-download the x64 archive. But can I, without any problem, simply delete this present file in my Search++ folder ?

BR

guy038

Coises

@guy038 said in Search++: A work in progress:

Regarding the Search++.pdb file, I can, of course, try to re-download the x64 archive. But can I, without any problem, simply delete this present file in the Search++ folder ?

Yes.

guy038

Hello, @coises and All,

Sorry to disturb you again, but would it be possible to, either :

Increase the width of the caret

OR

Allow, like in native N++, the modification of its width, in your Settings dialog

Personally, I use the value 3 in Caret Settings in Notepad++ and yours seems really tiny. So, it rather difficult to notice the caret location at first sight, in the Find dialog !

I must admit that I probably get used to this maximum value for the caret and that, now, any smaller size bothers me a bit !

Best Regards,

guy038

Coises

@guy038 said in Search++: A work in progress:

Sorry to disturb you again, but would it be possible to, either :

Increase the width of the caret

OR

Allow, like in native N++, the modification of its width, in your Settings dialog

I will add that to the list of things I copy from the the active document window when I initialize my Scintilla controls, and look for other similar settings (like the blink rate) that I missed. Not everything is simple to copy, but those are.

Thank you for pointing that out.

guy038

Hello, @coises and All,

I finally succeeded to get an almost exhaustive list of all the Unicode properties recognized by the ICU regex engine of the Search++ plugin !

As its file size is adbout 213 Kb, I’m about to share this simple text file, named ICU.txt, on my Drive Account

https://drive.google.com/file/d/15n8ttdX0hNxIazlRkToZn2XsINus5IOW/view?usp=sharing

Of course, this is my first attempt, which probably will need some modifications !

Best Regards,

guy038

Coises

@guy038 said in Search++: A work in progress:

I finally succeeded to get an almost exhaustive list of all the Unicode properties recognized by the ICU regex engine of the Search++ plugin !

[…]

https://drive.google.com/file/d/1litn6Ggjk-nRc8UOuxYS-5iO10_J-Z_2/view?usp=sharing

Edit to match updated quoted post: the link is now

https://drive.google.com/file/d/15n8ttdX0hNxIazlRkToZn2XsINus5IOW/view?usp=sharing

That clearly entailed a lot of work!

When I get closer to a “real” release, I will ask your permission to include some or all of that information as an appendix in the documentation for Search++.

guy038

Hi, @coises,

That’s really kind of you to ask for my permission.But, considering all the times you’ve listened to me, the least I can do is, of course, give you my full permission to use this file ;-))

Just note that I noticed, at the very end of the ICU.txt file, a small section that I used to verify if the \p{...} syntaxes were written correctly

As this part should not be part of the file, I modified the current file which, of course, changed the sharing link ! And I’ve just updated this link in my previous post !

Best Regards,

guy038

guy038

Hello, @coises and All,

Here is the second version of my list of all the Unicode properties, recognized by the ICU regex engine of the Search++ plugin !

I added 5 Unicode properties and some sections ( unsupported features, deprecated properties,… ) and I corrected a lot of mistakes !

In addition, although the ICU syntax is very flexible, I tried to adopt the same scheme throughout all sections of this file !

You can download this text file, named ICU.txt, from my Drive Account, at this location :

https://drive.google.com/file/d/1PAY5C2JO0q4-j8kfGKYs3VKarKn_xVa5/view?usp=sharing

Of course, I also deleted the previous version !

Now, one question regarding ICU

So far, you have chosen to disable replacements when using the ICU regular expression engine. What is that, exactly :

Are you worried about a possible malfunction of that feature ?
Are there any technical obstacles to implementing such a feature ?
Do you need more time to learn about and/or implement this feature ?
Or did you simply decide that it would never be used ?

Personally, I don’t see why this should be any more complicated than when using the Boost search engine when you click on the Regex button of your plugin !

Best Regards,

guy038

Coises

@guy038 said in Search++: A work in progress:

Now, one question regarding ICU

So far, you have chosen to disable replacements when using the ICU regular expression engine. What is that, exactly :

Are you worried about a possible malfunction of that feature ?

Are there any technical obstacles to implementing such a feature ?

Do you need more time to learn about and/or implement this feature ?

Or did you simply decide that it would never be used ?

Personally, I don’t see why this should be any more complicated than when using the Boost search engine when you click on the Regex button of your plugin !

The Boost.Regex design includes an interface that accesses the text to be searched through a templated iterator. That’s a bit of a technical C++ concept. In short, template means that the programmer can specify what sort of value will be matched (I chose a full Unicode code point, UTF-32) and iterator means that rather than giving the interface a single, contiguous block of memory filled with the value type, you give it a kind of index and separately write routines that return the value at that index; increase the index to the index of the next value; and decrease the index to the index of the previous value.

Scintilla stores documents in two separate blocks with a gap between — this facilitates inserting and deleting text without having to move all the following data every time — and it stores the data either in the system default encoding (“ANSI”) or in UTF-8. The template-and-iterator concept works well with this. Notepad++ search works that way, but there were technical reasons I did not want to use the same iterator code Notepad++ uses. Writing the three iterators I needed (one for single byte character sets, one for double-byte character sets, and one for UTF-8) was one of the trickier parts of getting my Columns++ search to work. Writing the template specialization for the “character traits” of my UTF-32 values was also a bit of work.

When doing a replacement, Boost.Regex takes a structure that is produced as a result of a match and the replacement string with symbols ($1, etc.) and returns the string with replacements, etc. made. From that, I use normal Scintilla editing commands to replace the matched string with the processed replacement.

The regular expression search in ICU is not templated. It operates strictly in UTF-16. It does not use iterators, but it does have its own way of virtualizing the text to be searched (UText). The only format directly supported by UText that is also used in Scintilla is UTF-8. Scintilla does accept a command to make all its text contiguous (moving the gap to the end), after which the text can be accessed — so long as the text is not modified and only until the next Scintilla call is performed — as a UTF-8 string. By limiting ICU search to only UTF-8 documents and allowing no modification, I could use the utext_openUTF8 interface to access the internal Scintilla buffer (after telling Scintilla to make it contiguous) in a way that is acceptable to the ICU regular expression matching interface.

The Find and Replace operation in ICU Regular Expressions is, to me, very strange. The documented way to use it is to build a new text that reproduces the entire source text, with replacements. That would make sense when reading a file and writing a new file; but it is, to me, not obvious how to apply it sensibly in the context of a text editor.

It’s probably possible to do everything necessary for good integration with Scintilla with ICU regular expressions, it’s just a big task, starting with learning more about how their UText extensibility works. I saw enough to think that it could be made to work on “ANSI” documents, it would just be a whole other side project in itself to figure out how. (The problem with just converting the document to Unicode is that you wind up with the starting and ending character positions of a match in the converted text, but no good way to convert those to positions in the original text, which is what you need to select the match in Scintilla.) Beyond the “ANSI” problem, forcing Scintilla to move its gap to the end before each search is not ideal, and for large documents that are being changed (as in find and replace) the performance loss would be bad. So even for Unicode I would need to write a different UText extension that doesn’t require contiguous UTF-8.

Then there’s analyzing that replacement logic and figuring out how to use it in a way that makes sense in Scintilla. It might be easier to implement replacements from scratch and completely ignore ICU’s replacement logic and syntax: based on what documentation exists, it seems likely that their supported syntax is very limited — none of the fancy stuff in Boost.Regex extended format strings. (Unfortunately, I couldn’t just use the Boost.Regex formatter, because it depends on the structured data produced by a Boost.Regex match, which is different than the structured data produced by an ICU match.)

Since I haven’t dug deeply into how ICU implements regular expressions, I can’t say how difficult it might be to customize or extend them. I have a better (not by any means comprehensive!) idea of what might be possible with Boost.Regex. Because of that, and because Notepad++ users are familiar with Boost.Regex syntax, I judged that it will probably be more practical to extend Boost.Regex with some features from ICU than to extend ICU toward Boost.Regex. Honestly, I don’t see the detailed Unicode properties of ICU as being nearly so valuable in practical, real-world use as the features Boost.Regex has that ICU doesn’t (\K and backtracking control are the ones I remember).

I do hope to extend the Boost.Regex implementation further. I’d like to implement Unicode word boundaries, but I haven’t yet gone into it deeply enough to determine whether it is practical. There might be a way to expose arbitrary Unicode properties, but that is also something I will have to study further. The biggest thing I’d like to do is figure out how to get a progress monitoring hook inside the matching process, so that annoying “too complex” message could be replaced by the ability to click cancel on a progress dialog — I know that will be challenging, and maybe not possible. I want to get the framework supporting what I have now up to a level where I feel that I can responsibly release it without classifying it as a “pre-release” before I get into those projects.

In the ICU search, I mostly included the stuff that was relatively easy — that I could copy from the same logic used for Plain and Regex. I thought it might serve as a good comparison test for whether I had implemented Unicode properties correctly in Regex — where there is a discrepancy, I should know why (such as my “ignore case insensitivity for character classes” rule). So I expected to leave it for future use to check as I try to extend Boost.Regex some more, but I thought I would probably hide it so users wouldn’t stumble over it. I never really thought it would be something many (any?) users would want.

The bottom line is that I only have so much time and capacity for concentration, and I don’t think making ICU regular expressions fully functional is likely to be the best use of it — at least not yet.

guy038

Hi, @coises,

Many thanks for your very exhaustive answer ! So OK, I understand that the ICU replacement seems really difficult to implement !

In my opinion, as the replacement syntax seemed simpler when using ICU than when using Boost, I naively thought that a solution could be implemented enough easily !

Sorry for my noob approach of the problem. And given what I now know, I won’t dare ask you about specific topics like this one, again . Just follow your train of thought, which I am convinced ,will lead to a polished final Search++ plugin !

BR

guy038

Coises

@guy038 said in Search++: A work in progress:

I won’t dare ask you about specific topics like this one, again

There is no reason you shouldn’t ask. I am sorry if I sounded defensive or otherwise implied that the question was troublesome. I realize I probably gave a more exhaustive (or exhausting) reply than you really needed… I kind of got caught up “explaining it to myself.”

It’s never really possible for someone who isn’t doing the programming to know whether something will be easy, difficult or anywhere in between. Some things seem like they should be easy… and they are! Others have complications that don’t turn up until you’ve already put in a bunch of time and effort, then you discover one little detail that you can’t change that blocks your entire approach. So for me, too, it’s only a guess how much work it would be, though I have more information from which to make my guess.

Please don’t hesitate to ask questions and make suggestions. It’s possible that I have a good reason for doing or not doing any given thing; it’s equally possible that I just never thought about it. I don’t expect people who aren’t working with the code to know the difference, and neither should they expect that of themselves. Though I have to prioritize, and some things don’t make the “first cut,” all feedback helps; what doesn’t get done for one release will still be waiting to be considered in another. Even if I outright say, “No, I’m just not going to do that,” it still tells me there’s a need adjacent to what I’m building that I haven’t addressed, and I should think about how to make it better when I can.

Thank you, @guy038, for all the work you’ve done so far to test and explore this project.