Minor typo in the manual for regex control character \c☒

mkupper

The manual section for regular expressions / control characters has a minor typo in \c☒ ⇒ The control character obtained from character ☒ by stripping all but its 6 lowest order bits.

That should be the 5 lowest bits, not 6.

\c☒ turns out to work with Unicode characters U+0000 to U+FFFF for ☒. For example, to find a tab which is \x09 you can use \c followed by a tab character itself or any of these: \c) \cI \ci \c \c© \cÉ \cé \cĉ \cĩ \cŉ \cũ \cƉ \cƩ \cǉ \cǩ \cȉ \cȩ \cɉ \cɩ \cʉ \cʩ \cˉ \c˩ \c̉ \c̩ \c͉ \cͩ \cΉ \cΩ \cω \cϩ \cЉ \cЩ \cщ \cѩ \c҉ \cҩ \cӉ \cө \cԉ \cԩ \cՉ \cթ \c։ \c֩ \cש \c؉ \cة \cى \c٩ \cډ \cک \cۉ \c۩ \c܉ \cܩ \c݉ \cݩ \cމ \cީ \c߉ \cߩ \cࠉ \cࠩ \cࡉ \cࡩ \cࢉ \cࢩ \cࣉ \cࣩ \cउ \cऩ \cॉ \c३ \cউ \c৩ \cਉ \c੩ \cઉ \cૉ \c૩ \cଉ \c୩ \cஉ \cன \c௩ \cఉ \c౩ \cಉ \c೩ \cഉ \cഩ \c൩ \cඉ \cඩ \c෩ \cฉ \cษ \c้ \cຉ \cຩ \c້ \c༉ \c༩ \cཉ \cཀྵ \cྉ \cྩ \c࿉ \cဉ \cဩ \c၉ \cၩ \cႉ \cႩ \cჩ \cᄉ \cᄩ \cᅉ \cᅩ \cᆉ \cᆩ \cᇉ \cᇩ \cሉ \cሩ \cቩ \cኩ \cዉ \cዩ \cጉ \cጩ \cፉ \c፩ \cᎉ \cᎩ \cᏉ \cᏩ \cᐉ \cᐩ \cᑉ \cᑩ \cᒉ \cᒩ \cᓉ \cᓩ \cᔉ \cᔩ \cᕉ \cᕩ \cᖉ \cᖩ \cᗉ \cᗩ \cᘉ \cᘩ \cᙉ \cᙩ \cᚉ \cᚩ \cᛉ \cᛩ \cᜉ \cᜩ \cᝉ \cᝩ \cញ \cឩ \c៉ \c៩ \c᠉ \cᠩ \cᡉ \cᡩ \cᢉ \cᢩ \cᣉ \cᣩ \cᤉ \cᤩ \c᥉ \cᥩ \cᦉ \cᦩ \cᧉ \c᧩ \cᨉ \cᨩ \cᩉ \cᩩ \c᪉ \c᪩ \c᫉ \cᬉ \cᬩ \cᭉ \c᭩ \cᮉ \cᮩ \cᯉ \cᯩ \cᰉ \cᰩ \c᱉ \cᱩ \cᲩ \cᳩ \cᴉ \cᴩ \cᵉ \cᵩ \cᶉ \cᶩ \c᷉ \cᷩ \cḉ \cḩ \cṉ \cṩ \cẉ \cẩ \cỉ \cứ \cἉ \cἩ \cὉ \cὩ \cᾉ \cᾩ \cΈ \cῩ \c \c  \c⁉ \c⁩ \c₉ \c₩ \c⃩ \c℉ \c℩ \cⅉ \cⅩ \c↉ \c↩ \c⇉ \c⇩ \c∉ \c∩ \c≉ \c≩ \c⊉ \c⊩ \c⋉ \c⋩ \c⌉ \c〈 \c⍉ \c⍩ \c⎉ \c⎩ \c⏉ \c⏩ \c␉ \c⑉ \c⑩ \c⒉ \c⒩ \cⓉ \cⓩ \c┉ \c┩ \c╉ \c╩ \c▉ \c▩ \c◉ \c◩ \c☉ \c☩ \c♉ \c♩ \c⚉ \c⚩ \c⛉ \c⛩ \c✉ \c✩ \c❉ \c❩ \c➉ \c➩ \c⟉ \c⟩ \c⠉ \c⠩ \c⡉ \c⡩ \c⢉ \c⢩ \c⣉ \c⣩ \c⤉ \c⤩ \c⥉ \c⥩ \c⦉ \c⦩ \c⧉ \c⧩ \c⨉ \c⨩ \c⩉ \c⩩ \c⪉ \c⪩ \c⫉ \c⫩ \c⬉ \c⬩ \c⭉ \c⭩ \c⮉ \c⮩ \c⯉ \c⯩ \cⰉ \cⰩ \cⱉ \cⱩ \cⲉ \cⲩ \cⳉ \c⳩ \cⴉ \cⵉ \cⶉ \cⶩ \cⷉ \cⷩ \c⸉ \c⸩ \c⹉ \c⺉ \c⺩ \c⻉ \c⻩ \c⼉ \c⼩ \c⽉ \c⽩ \c⾉ \c⾩ \c⿉ \c〉 \c〩 \cぉ \cど \cら \cォ \cド \cラ \cㄉ \cㄩ \cㅉ \cㅩ \cㆉ \cㆩ \c㇉ \c㈉ \c㈩ \c㉉ \c㉩ \c㊉ \c㊩ \c㋉ \c㋩ \c㌉ \c㌩ \c㍉ \c㍩ \c㎉ \c㎩ \c㏉ \c㏩ \c㐉 \c㐩 \c㑉 \c㑩 \c㒉 \c㒩 \c㓉 \c㓩 \c㔉 \c㔩 \c㕉 \c㕩 \c㖉 \c㖩 \c㗉 \c㗩 \c㘉 \c㘩 \c㙉 \c㙩 \c㚉 \c㚩 \c㛉 \c㛩 \c㜉 \c㜩 \c㝉 \c㝩 \c㞉 \c㞩 \c㟉 \c㟩 \c㠉 \c㠩 \c㡉 \c㡩 \c㢉 \c㢩 \c㣉 \c㣩 \c㤉 \c㤩 \c㥉 \c㥩 \c㦉 \c㦩 \c㧉 \c㧩 \c㨉 \c㨩 \c㩉 \c㩩 \c㪉 \c㪩 \c㫉 \c㫩 \c㬉 \c㬩 \c㭉 \c㭩 \c㮉 \c㮩 \c㯉 \c㯩 \c㰉 \c㰩 \c㱉 \c㱩 \c㲉 \c㲩 \c㳉 \c㳩 \c㴉 \c㴩 \c㵉 \c㵩 \c㶉 \c㶩 \c㷉 \c㷩 \c㸉 \c㸩 \c㹉 \c㹩 \c㺉 \c㺩 \c㻉 \c㻩 \c㼉 \c㼩 \c㽉 \c㽩 \c㾉 \c㾩 \c㿉 \c㿩 \c䀉 \c䀩 \c䁉 \c䁩 \c䂉 \c䂩 \c䃉 \c䃩 \c䄉 \c䄩 \c䅉 \c䅩 \c䆉 \c䆩 \c䇉 \c䇩 \c䈉 \c䈩 \c䉉 \c䉩 \c䊉 \c䊩 \c䋉 \c䋩 \c䌉 \c䌩 \c䍉 \c䍩 \c䎉 \c䎩 \c䏉 \c䏩 \c䐉 \c䐩 \c䑉 \c䑩 \c䒉 \c䒩 \c䓉 \c䓩 \c䔉 \c䔩 \c䕉 \c䕩 \c䖉 \c䖩 \c䗉 \c䗩 \c䘉 \c䘩 \c䙉 \c䙩 \c䚉 \c䚩 \c䛉 \c䛩 \c䜉 \c䜩 \c䝉 \c䝩 \c䞉 \c䞩 \c䟉 \c䟩 \c䠉 \c䠩 \c䡉 \c䡩 \c䢉 \c䢩 \c䣉 \c䣩 \c䤉 \c䤩 \c䥉 \c䥩 \c䦉 \c䦩 \c䧉 \c䧩 \c䨉 \c䨩 \c䩉 \c䩩 \c䪉 \c䪩 \c䫉 \c䫩 \c䬉 \c䬩 \c䭉 \c䭩 \c䮉 \c䮩 \c䯉 \c䯩 \c䰉 \c䰩 \c䱉 \c䱩 \c䲉 \c䲩 \c䳉 \c䳩 \c䴉 \c䴩 \c䵉 \c䵩 \c䶉 \c䶩 \c䷉ \c䷩ \c三 \c丩 \c义 \c乩 \c争 \c亩 \c仉 \c仩 \c伉 \c伩 \c佉 \c佩 \c侉 \c侩 \c俉 \c俩 \c倉 \c倩 \c偉 \c偩 \c傉 \c傩 \c僉 \c僩 \c儉 \c儩 \c光 \c兩 \c冉 \c冩 \c凉 \c凩 \c刉 \c利 \c剉 \c剩 \c劉 \c助 \c勉 \c勩 \c匉 \c匩 \c卉 \c卩 \c厉 \c厩 \c叉 \c叩 \c吉 \c吩 \c呉 \c呩 \c咉 \c咩 \c哉 \c哩 \c唉 \c唩 \c啉 \c啩 \c喉 \c喩 \c嗉 \c嗩 \c嘉 \c嘩 \c噉 \c噩 \c嚉 \c嚩 \c囉 \c囩 \c圉 \c圩 \c坉 \c坩 \c垉 \c垩 \c埉 \c埩 \c堉 \c堩 \c塉 \c塩 \c墉 \c墩 \c壉 \c壩 \c変 \c天 \c奉 \c奩 \c妉 \c妩 \c姉 \c姩 \c娉 \c娩 \c婉 \c婩 \c媉 \c媩 \c嫉 \c嫩 \c嬉 \c嬩 \c孉 \c孩 \c安 \c宩 \c寉 \c審 \c尉 \c尩 \c屉 \c屩 \c岉 \c岩 \c峉 \c峩 \c崉 \c崩 \c嵉 \c嵩 \c嶉 \c嶩 \c巉 \c巩 \c帉 \c帩 \c幉 \c幩 \c庉 \c庩 \c廉 \c廩 \c弉 \c弩 \c彉 \c彩 \c徉 \c復 \c忉 \c忩 \c怉 \c怩 \c恉 \c恩 \c悉 \c悩 \c惉 \c惩 \c愉 \c愩 \c慉 \c慩 \c憉 \c憩 \c應 \c懩 \c戉 \c戩 \c扉 \c扩 \c抉 \c抩 \c拉 \c择 \c按 \c挩 \c捉 \c捩 \c掉 \c掩 \c揉 \c揩 \c搉 \c搩 \c摉 \c摩 \c撉 \c撩 \c擉 \c擩 \c攉 \c攩 \c敉 \c敩 \c斉 \c斩 \c旉 \c早 \c昉 \c昩 \c晉 \c晩 \c暉 \c暩 \c曉 \c曩 \c有 \c朩 \c杉 \c杩 \c枉 \c枩 \c柉 \c柩 \c栉 \c栩 \c桉 \c桩 \c梉 \c梩 \c棉 \c棩 \c椉 \c椩 \c楉 \c楩 \c榉 \c榩 \c槉 \c槩 \c樉 \c権 \c橉 \c橩 \c檉 \c檩 \c櫉 \c櫩 \c欉 \c欩 \c歉 \c歩 \c殉 \c殩 \c毉 \c毩 \c氉 \c氩 \c汉 \c汩 \c沉 \c沩 \c泉 \c泩 \c洉 \c洩 \c浉 \c浩 \c涉 \c涩 \c淉 \c淩 \c渉 \c温 \c湉 \c湩 \c溉 \c溩 \c滉 \c滩 \c漉 \c漩 \c潉 \c潩 \c澉 \c澩 \c濉 \c濩 \c瀉 \c瀩 \c灉 \c灩 \c炉 \c炩 \c烉 \c烩 \c焉 \c焩 \c煉 \c煩 \c熉 \c熩 \c燉 \c燩 \c爉 \c爩 \c牉 \c物 \c犉 \c犩 \c狉 \c狩 \c猉 \c猩 \c獉 \c獩 \c玉 \c玩 \c珉 \c珩 \c琉 \c琩 \c瑉 \c瑩 \c璉 \c璩 \c瓉 \c瓩 \c甉 \c甩 \c畉 \c畩 \c疉 \c疩 \c痉 \c痩 \c瘉 \c瘩 \c癉 \c癩 \c皉 \c皩 \c盉 \c盩 \c眉 \c眩 \c睉 \c睩 \c瞉 \c瞩 \c矉 \c矩 \c砉 \c砩 \c硉 \c硩 \c碉 \c碩 \c磉 \c磩 \c礉 \c礩 \c祉 \c祩 \c禉 \c禩 \c秉 \c秩 \c稉 \c稩 \c穉 \c穩 \c窉 \c窩 \c竉 \c竩 \c笉 \c笩 \c等 \c筩 \c箉 \c箩 \c築 \c篩 \c簉 \c簩 \c籉 \c籩 \c粉 \c粩 \c糉 \c糩 \c紉 \c紩 \c絉 \c絩 \c綉 \c綩 \c緉 \c緩 \c縉 \c縩 \c繉 \c繩 \c纉 \c纩 \c绉 \c绩 \c缉 \c缩 \c罉 \c罩 \c羉 \c義 \c翉 \c翩 \c耉 \c耩 \c聉 \c聩 \c肉 \c肩 \c胉 \c胩 \c脉 \c脩 \c腉 \c腩 \c膉 \c膩 \c臉 \c臩 \c舉 \c舩 \c艉 \c艩 \c芉 \c芩 \c苉 \c苩 \c茉 \c茩 \c草 \c荩 \c莉 \c莩 \c菉 \c菩 \c萉 \c萩 \c葉 \c葩 \c蒉 \c蒩 \c蓉 \c蓩 \c蔉 \c蔩 \c蕉 \c蕩 \c薉 \c薩 \c藉 \c藩 \c蘉 \c蘩 \c虉 \c虩 \c蚉 \c蚩 \c蛉 \c蛩 \c蜉 \c蜩 \c蝉 \c蝩 \c螉 \c螩 \c蟉 \c蟩 \c蠉 \c蠩 \c衉 \c衩 \c袉 \c袩 \c裉 \c裩 \c褉 \c褩 \c襉 \c襩 \c覉 \c覩 \c觉 \c觩 \c訉 \c訩 \c詉 \c詩 \c誉 \c誩 \c諉 \c諩 \c謉 \c謩 \c證 \c譩 \c讉 \c让 \c诉 \c诩 \c谉 \c谩 \c豉 \c豩 \c貉 \c販 \c賉 \c賩 \c贉 \c贩 \c赉 \c赩 \c趉 \c趩 \c跉 \c跩 \c踉 \c踩 \c蹉 \c蹩 \c躉 \c躩 \c軉 \c軩 \c載 \c輩 \c轉 \c轩 \c辉 \c辩 \c迉 \c迩 \c选 \c逩 \c遉 \c適 \c邉 \c邩 \c郉 \c郩 \c鄉 \c鄩 \c酉 \c酩 \c醉 \c醩 \c釉 \c釩 \c鈉 \c鈩 \c鉉 \c鉩 \c銉 \c銩 \c鋉 \c鋩 \c錉 \c錩 \c鍉 \c鍩 \c鎉 \c鎩 \c鏉 \c鏩 \c鐉 \c鐩 \c鑉 \c鑩 \c钉 \c钩 \c铉 \c铩 \c锉 \c锩 \c镉 \c镩 \c閉 \c閩 \c闉 \c闩 \c阉 \c阩 \c陉 \c险 \c隉 \c隩 \c雉 \c雩 \c霉 \c霩 \c靉 \c革 \c鞉 \c鞩 \c韉 \c韩 \c頉 \c頩 \c顉 \c顩 \c颉 \c颩 \c飉 \c飩 \c餉 \c餩 \c饉 \c饩 \c馉 \c馩 \c駉 \c駩 \c騉 \c騩 \c驉 \c驩 \c骉 \c骩 \c髉 \c髩 \c鬉 \c鬩 \c魉 \c魩 \c鮉 \c鮩 \c鯉 \c鯩 \c鰉 \c鰩 \c鱉 \c鱩 \c鲉 \c鲩 \c鳉 \c鳩 \c鴉 \c鴩 \c鵉 \c鵩 \c鶉 \c鶩 \c鷉 \c鷩 \c鸉 \c鸩 \c鹉 \c鹩 \c麉 \c麩 \c黉 \c黩 \c鼉 \c鼩 \c齉 \c齩 \c龉 \c龩 \c鿉 \c鿩 \cꀉ \cꀩ \cꁉ \cꁩ \cꂉ \cꂩ \cꃉ \cꃩ \cꄉ \cꄩ \cꅉ \cꅩ \cꆉ \cꆩ \cꇉ \cꇩ \cꈉ \cꈩ \cꉉ \cꉩ \cꊉ \cꊩ \cꋉ \cꋩ \cꌉ \cꌩ \cꍉ \cꍩ \cꎉ \cꎩ \cꏉ \cꏩ \cꐉ \cꐩ \cꑉ \cꑩ \cꒉ \c꒩ \cꓩ \cꔉ \cꔩ \cꕉ \cꕩ \cꖉ \cꖩ \cꗉ \cꗩ \cꘉ \c꘩ \cꙉ \cꙩ \cꚉ \cꚩ \cꛉ \cꛩ \c꜉ \cꜩ \cꝉ \cꝩ \c꞉ \cꞩ \cꟉ \cꠉ \c꠩ \cꡉ \cꡩ \cꢉ \cꢩ \c꣩ \c꤉ \cꤩ \cꥉ \cꥩ \cꦉ \cꦩ \c꧉ \cꧩ \cꨉ \cꨩ \cꩉ \cꩩ \cꪉ \cꪩ \cꫩ \cꬉ \cꬩ \cꭉ \cꭩ \cꮉ \cꮩ \cꯉ \cꯩ \cퟩ \c契 \c朗 \c雷 \c數 \c黎 \c囹 \c柳 \c里 \c降 \c﨩 \c爫 \c響 \c憎 \c睊 \c韛 \c﬩ \cשּ \cﭩ \cﮉ \cﮩ \cﯩ \cﰉ \cﰩ \cﱉ \cﱩ \cﲉ \cﲩ \cﳉ \cﳩ \cﴉ \cﴩ \c﵉ \cﵩ \cﶉ \cﶩ \c︉ \c︩ \c﹉ \c﹩ \cﺉ \cﺩ \cﻉ \cﻩ \c） \cＩ \cｉ \cｩ \cﾉ \cﾩ \c￩

All of those are Unicode characters where the lower five bits are 01001.

The intent behind \c☒ was that it would be used with \ci or \cI as Ctrl+I is a tab.

Alan Kilborn

@mkupper

https://github.com/notepad-plus-plus/npp-usermanual/issues

guy038

Hello, @mkupper, @peterjones and All,

I had a look to the part of the N++ documentation, regarding the way to find out the C0 Control chars, mentioned by @mkupper !

Remember that the Unicode C0 Control characters range is the range [\x00-\x1F], ONLY !

Regarding the \c☒ notation, the Boost regex engine follows the rules of the equivalence table , below :

                                                        0020     0040     0060     0080       00A0     00C0     00E0     0100     0120     ...      FF80
                                                        003F     005F     007F     009F       00BF     00DF     00FF     011F     013F     ...      FF9F
													  
\x00  =  NUL  ( NULL                     )  =  \x00  =  \c       \c@      \c`      \cPAD      \c       \cÀ      \cà      \cĀ      \cĠ      ...      \cﾀ
\x01  =  SOH  ( START of HEADER          )  =  \x01  =  \c!      \cA      \ca      \cHOP      \c¡      \cÁ      \cá      \cā      \cġ      ...      \cﾁ
\x02  =  STX  ( START of TEXT            )  =  \x02  =  \c"      \cB      \cb      \cBHP      \c¢      \cÂ      \câ      \cĂ      \cĢ      ...      \cﾂ
\x03  =  ETX  ( END   of TEXT            )  =  \x03  =  \c#      \cC      \cc      \cNBH      \c£      \cÃ      \cã      \că      \cģ      ...      \cﾃ
\x04  =  EOT  ( END   of TRANSMISSION    )  =  \x04  =  \c$      \cD      \cd      \cIND      \c¤      \cÄ      \cä      \cĄ      \cĤ      ...      \cﾄ
\x05  =  ENQ  ( ENQUIREMENT              )  =  \x05  =  \c%      \cE      \ce      \cNEL      \c¥      \cÅ      \cå      \cą      \cĥ      ...      \cﾅ
\x06  =  ACK  ( ACKNOWLEDGEMENT          )  =  \x06  =  \c&      \cF      \cf      \cSSA      \c¦      \cÆ      \cæ      \cĆ      \cĦ      ...      \cﾆ
\x07  =  BEL  ( BELL                     )  =  \x07  =  \c'      \cG      \cg      \cESA      \c§      \cÇ      \cç      \cć      \cħ      ...      \cﾇ
\x08  =  BS   ( BACK SPACE               )  =  \x08  =  \c(      \cH      \ch      \cHTS      \c¨      \cÈ      \cè      \cĈ      \cĨ      ...      \cﾈ
\x09  =  TAB  ( HORIZONTAL TABULATION    )  =  \x09  =  \c)      \cI      \ci      \cHTJ      \c©      \cÉ      \cé      \cĉ      \cĩ      ...      \cﾉ
\x0A  =  LF   ( LINE FEED                )  =  \x0A  =  \c*      \cJ      \cj      \cVTS      \cª      \cÊ      \cê      \cĊ      \cĪ      ...      \cﾊ
\x0B  =  VT   ( VERTICAL   TABULATION    )  =  \x0B  =  \c+      \cK      \ck      \cPLD      \c«      \cË      \cë      \cċ      \cī      ...      \cﾋ
\x0C  =  FF   ( FORM FEED                )  =  \x0C  =  \c,      \cL      \cl      \cPLU      \c¬      \cÌ      \cì      \cČ      \cĬ      ...      \cﾌ
\x0D  =  CR   ( CARRIAGE RETURN          )  =  \x0D  =  \c-      \cM      \cm      \cRI       \c      \cÍ      \cí      \cč      \cĭ      ...      \cﾍ
\x0E  =  SO   ( SHIFT OUT                )  =  \x0E  =  \c.      \cN      \cn      \cSS2      \c®      \cÎ      \cî      \cĎ      \cĮ      ...      \cﾎ
\x0F  =  SI   ( SHIFT iN                 )  =  \x0F  =  \c/      \cO      \co      \cSS3      \c¯      \cÏ      \cï      \cď      \cį      ...      \cﾏ
\x10  =  DLE  ( DELETE                   )  =  \x10  =  \c0      \cP      \cp      \cDCS      \c°      \cÐ      \cð      \cĐ      \cİ      ...      \cﾐ
\x11  =  DC1  ( DEVICE CONTROL 1         )  =  \x11  =  \c1      \cQ      \cq      \cPU1      \c±      \cÑ      \cñ      \cđ      \cı      ...      \cﾑ
\x12  =  DC2  ( DEVICE CONTROL 2         )  =  \x12  =  \c2      \cR      \cr      \cPU2      \c²      \cÒ      \cò      \cĒ      \cĲ      ...      \cﾒ
\x13  =  DC3  ( DEVICE CONTROL 3         )  =  \x13  =  \c3      \cS      \cs      \cSTS      \c³      \cÓ      \có      \cē      \cĳ      ...      \cﾓ
\x14  =  DC4  ( DEVICE CONTROL 4         )  =  \x14  =  \c4      \cT      \ct      \cCCH      \c´      \cÔ      \cô      \cĔ      \cĴ      ...      \cﾔ
\x15  =  NAK  ( NEGATIVE ACKNOWLEDGEMENT )  =  \x15  =  \c5      \cU      \cu      \cMW       \cµ      \cÕ      \cõ      \cĕ      \cĵ      ...      \cﾕ
\x16  =  SYN  ( SYNCHRONISATION          )  =  \x16  =  \c6      \cV      \cv      \cSPA      \c¶      \cÖ      \cö      \cĖ      \cĶ      ...      \cﾖ
\x17  =  ETB  ( END TRANSMISSION BLOCK   )  =  \x17  =  \c7      \cW      \cw      \cEPA      \c·      \c×      \c÷      \cė      \cķ      ...      \cﾗ
\x18  =  CAN  ( CANCEL                   )  =  \x18  =  \c8      \cX      \cx      \cSOS      \c¸      \cØ      \cø      \cĘ      \cĸ      ...      \cﾘ
\x19  =  EM   ( END of MEDIUM            )  =  \x19  =  \c9      \cY      \cy      \cSGCI     \c¹      \cÙ      \cù      \cę      \cĹ      ...      \cﾙ
\x1A  =  SUB  ( SUBSTITUTION             )  =  \x1A  =  \c:      \cZ      \cz      \cSCI      \cº      \cÚ      \cú      \cĚ      \cĺ      ...      \cﾚ
\x1B  =  ESC  ( ESCAPE                   )  =  \x1B  =  \c;      \c[      \c{      \cCSI      \c»      \cÛ      \cû      \cě      \cĻ      ...      \cﾛ
\x1C  =  FS   ( FILE   SEPARATOR         )  =  \x1C  =  \c<      \c\      \c|      \cST       \c¼      \cÜ      \cü      \cĜ      \cļ      ...      \cﾜ
\x1D  =  GS   ( GROUP  SEPARATOR         )  =  \x1D  =  \c=      \c]      \c}      \cOSC      \c½      \cÝ      \cý      \cĝ      \cĽ      ...      \cﾝ
\x1E  =  RS   ( RECORD SEPARATOR         )  =  \x1E  =  \c>      \c^      \c~      \cPM       \c¾      \cÞ      \cþ      \cĞ      \cľ      ...      \cﾞ
\x1F  =  US   ( UNIT   SEPARATOR         )  =  \x1F  =  \c?      \c_      \c      \cAPC      \c¿      \cß      \cÿ      \cğ      \cĿ      ...      \cﾟ

Note that the values, under the 0080 - 009F column, represent the string \c followed with the true C1 Control char, in the range [\x80-\x9F]
So, paradoxically, these C1 Control values may be used, also, to identify the C0 Control characters !!

Thus, for example, if you want to search for any SHIFT OUT control char ( ), you can use any of these regexes :

\x0E , \x{0E} or \x{000E}
\c.
\cN
\cn
\c
\c®
\cÎ
\cî
\cĎ
\cĮ
...
...
...
\cﾎ

So, Peter when you say that the search \c1 matches the SOH char ( ), it’s not exact. The \c1 search do match the DC1 char ( ) !

And I confirm that any \c string, followed with a char outside the BMP ( so over \x{FFFF} ), cannot be used to reach a C0 control char !

Best Regards,

guy038

PeterJones

@guy038 said in Minor typo in the manual for regex control character \c☒:

So, Peter when you say that the search \c1 matches

I didn’t say that. Most of the Regex documentation was direct copy/paste from the original Wiki version that the Manual was derived from, including that original phrasing. (It had been edited over time, but the original version still had it described essentially the same)

I will fix it, but it wasn’t my mistake originally. (Given that 1 and ! are on the same key on US keyboards, whoever typed that line in the original Wiki probably just didn’t hold down the shift key while trying to type the correct \c! for the SOH).

I will update the manual so it doesn’t use that example at all, and instead just keep the \ca and \cA versions, since those are the ones that are mnemonicly helpful.

guy038

Hi, @mkupper, @peterjones, and All,

Yes, @peterjones, you’re right about it : The \cA and \ca syntaxes seem the only pertinent ones, in addition to the \x## notation too !

BR

guy038

mkupper

@guy038, @peterjones, and others.

It turns out the \c☒ topic gets fairly messy, and is far too messy to document the details in the manual. I started playing with ANSI…

\c☒ with ANSI or ASCII codes \x00 to \x7F works well and searches for the lower five bits of the ☒ character. Realistically, you should only do it with A-Z or a-z. Better yet is to use x## or x{####} style expressions as it’s clearer as to what is being searched for.

A case sensitive search for \c☒ using ANSI codes \x80 to \xFF matches ANSI codes in the \xE0 to \xFF range, with some exceptions… The logic first extracts the lower five bits of ☒ and then bitwise-or that with 11100000 or 0xE0. For example, all of these will match ANSI character 0xEC which is ì.

Hex		Pattern
\x8C		\cŒ
\xAC		\c¬
\xCC		\cÌ
\xEC		\cì

The lower five bits of the above hex codes ‘\x8C’, ‘\xAC’, ‘\xCC’, and ‘\xEC’ is 01100 or \x0C and we bitwise-or that result with 11100000 or 0xE0 to search for \xEC.

It turns out that with one exception, all of the ANSI characters in the \xE0 to \xFF range are lower case letters. A case-insensitive search for \c☒ using ANSI codes \x80 to \xFF works just like the case-insensitive version I just described but also matches the upper case forms of the letters in \xE0 to \xFF range.

The one exception is ANSI character code \xF7 which is a divide by sign ÷. A search for \c—, \c·, \c×, or \c÷ only matches ÷ when you use a case-insensitive search.

Searching for \c (\x20), \c@ (\x40), \c` (\x60), \c€ (\x80), \c (\xA0), \cÀ (\xC0), and \cà (\xE0) all match NUL (\x00) in ANSI encoded files. With one exception also match NUL (\x{0000}) in UTF-8 encoded files. The exception is searching for \c€ (\x80) matches \x{000C} (form feed) and not NUL \x{0000}.

Because searches for \c€ (\x80), \c (\xA0), \cÀ (\xC0), and \cà (\xE0) all match NUL (\x00) in ANSI files it means you can’t use them to match the lower case à at ANSI character \xE0 nor it’s upper-case À at \xC0.

I also ran across that while Notepad++ supports searching for \x00 or \x{0000} both which match a NUL (\x00 or \x{0000}) in a file using \x00 or \x{0000} in the replacement part both results in the replacement string getting terminated at the NUL (\x00 or \x{0000}) character.

As replacement strings are terminated at the NUL using \c~ where the ~ is a NUL (\x00) returns Invalid Regular Expression with the details being:

ASCII escape sequence terminated
prematurely. The error occurred
while parsing the regular expression:
'>>>HERE>>>\c'.

Using a search for xxx and replace of aaa\x00zzz or aaa\x{0000}zzz both result in xxx being replaced with aaa as the replacement string was terminated at the NUL. Apparently the engine first does a pass where it converted the \x☒☒ and \x{☒☒☒☒} forms of characters into the actual character value meaning \x00 or \x{0000} in a replacement simply terminates the string at that point.

I suspect that bug could be used to add a comment to the replacement!
Search: Hello
Replace: World\x0 This will never happen

Windows also use NUL as the text string terminator in its copy/paste system.