Minor typo in the manual for regex control character \c☒
-
The manual section for regular expressions / control characters has a minor typo in
\c☒ ⇒ The control character obtained from character ☒ by stripping all but its 6 lowest order bits.That should be the 5 lowest bits, not 6.
\c☒turns out to work with Unicode characters U+0000 to U+FFFF for ☒. For example, to find a tab which is \x09 you can use \c followed by a tab character itself or any of these: \c) \cI \ci \c \c© \cÉ \cé \cĉ \cĩ \cʼn \cũ \cƉ \cƩ \clj \cǩ \cȉ \cȩ \cɉ \cɩ \cʉ \cʩ \cˉ \c˩ \c̉ \c̩ \c͉ \cͩ \cΉ \cΩ \cω \cϩ \cЉ \cЩ \cщ \cѩ \c҉ \cҩ \cӉ \cө \cԉ \cԩ \cՉ \cթ \c։ \c֩ \cש \c؉ \cة \cى \c٩ \cډ \cک \cۉ \c۩ \c܉ \cܩ \c݉ \cݩ \cމ \cީ \c߉ \cߩ \cࠉ \cࠩ \cࡉ \cࡩ \cࢉ \cࢩ \cࣉ \cࣩ \cउ \cऩ \cॉ \c३ \cউ \c৩ \cਉ \c੩ \cઉ \cૉ \c૩ \cଉ \c୩ \cஉ \cன \c௩ \cఉ \c౩ \cಉ \c೩ \cഉ \cഩ \c൩ \cඉ \cඩ \c෩ \cฉ \cษ \c้ \cຉ \cຩ \c້ \c༉ \c༩ \cཉ \cཀྵ \cྉ \cྩ \c࿉ \cဉ \cဩ \c၉ \cၩ \cႉ \cႩ \cჩ \cᄉ \cᄩ \cᅉ \cᅩ \cᆉ \cᆩ \cᇉ \cᇩ \cሉ \cሩ \cቩ \cኩ \cዉ \cዩ \cጉ \cጩ \cፉ \c፩ \cᎉ \cᎩ \cᏉ \cᏩ \cᐉ \cᐩ \cᑉ \cᑩ \cᒉ \cᒩ \cᓉ \cᓩ \cᔉ \cᔩ \cᕉ \cᕩ \cᖉ \cᖩ \cᗉ \cᗩ \cᘉ \cᘩ \cᙉ \cᙩ \cᚉ \cᚩ \cᛉ \cᛩ \cᜉ \cᜩ \cᝉ \cᝩ \cញ \cឩ \c៉ \c៩ \c᠉ \cᠩ \cᡉ \cᡩ \cᢉ \cᢩ \cᣉ \cᣩ \cᤉ \cᤩ \c᥉ \cᥩ \cᦉ \cᦩ \cᧉ \c᧩ \cᨉ \cᨩ \cᩉ \cᩩ \c᪉ \c᪩ \c᫉ \cᬉ \cᬩ \cᭉ \c᭩ \cᮉ \cᮩ \cᯉ \cᯩ \cᰉ \cᰩ \c᱉ \cᱩ \cᲩ \cᳩ \cᴉ \cᴩ \cᵉ \cᵩ \cᶉ \cᶩ \c᷉ \cᷩ \cḉ \cḩ \cṉ \cṩ \cẉ \cẩ \cỉ \cứ \cἉ \cἩ \cὉ \cὩ \cᾉ \cᾩ \cΈ \cῩ \c \c \c⁉ \c \c₉ \c₩ \c⃩ \c℉ \c℩ \cⅉ \cⅩ \c↉ \c↩ \c⇉ \c⇩ \c∉ \c∩ \c≉ \c≩ \c⊉ \c⊩ \c⋉ \c⋩ \c⌉ \c〈 \c⍉ \c⍩ \c⎉ \c⎩ \c⏉ \c⏩ \c␉ \c⑉ \c⑩ \c⒉ \c⒩ \cⓉ \cⓩ \c┉ \c┩ \c╉ \c╩ \c▉ \c▩ \c◉ \c◩ \c☉ \c☩ \c♉ \c♩ \c⚉ \c⚩ \c⛉ \c⛩ \c✉ \c✩ \c❉ \c❩ \c➉ \c➩ \c⟉ \c⟩ \c⠉ \c⠩ \c⡉ \c⡩ \c⢉ \c⢩ \c⣉ \c⣩ \c⤉ \c⤩ \c⥉ \c⥩ \c⦉ \c⦩ \c⧉ \c⧩ \c⨉ \c⨩ \c⩉ \c⩩ \c⪉ \c⪩ \c⫉ \c⫩ \c⬉ \c⬩ \c⭉ \c⭩ \c⮉ \c⮩ \c⯉ \c⯩ \cⰉ \cⰩ \cⱉ \cⱩ \cⲉ \cⲩ \cⳉ \c⳩ \cⴉ \cⵉ \cⶉ \cⶩ \cⷉ \cⷩ \c⸉ \c⸩ \c⹉ \c⺉ \c⺩ \c⻉ \c⻩ \c⼉ \c⼩ \c⽉ \c⽩ \c⾉ \c⾩ \c⿉ \c〉 \c〩 \cぉ \cど \cら \cォ \cド \cラ \cㄉ \cㄩ \cㅉ \cㅩ \cㆉ \cㆩ \c㇉ \c㈉ \c㈩ \c㉉ \c㉩ \c㊉ \c㊩ \c㋉ \c㋩ \c㌉ \c㌩ \c㍉ \c㍩ \c㎉ \c㎩ \c㏉ \c㏩ \c㐉 \c㐩 \c㑉 \c㑩 \c㒉 \c㒩 \c㓉 \c㓩 \c㔉 \c㔩 \c㕉 \c㕩 \c㖉 \c㖩 \c㗉 \c㗩 \c㘉 \c㘩 \c㙉 \c㙩 \c㚉 \c㚩 \c㛉 \c㛩 \c㜉 \c㜩 \c㝉 \c㝩 \c㞉 \c㞩 \c㟉 \c㟩 \c㠉 \c㠩 \c㡉 \c㡩 \c㢉 \c㢩 \c㣉 \c㣩 \c㤉 \c㤩 \c㥉 \c㥩 \c㦉 \c㦩 \c㧉 \c㧩 \c㨉 \c㨩 \c㩉 \c㩩 \c㪉 \c㪩 \c㫉 \c㫩 \c㬉 \c㬩 \c㭉 \c㭩 \c㮉 \c㮩 \c㯉 \c㯩 \c㰉 \c㰩 \c㱉 \c㱩 \c㲉 \c㲩 \c㳉 \c㳩 \c㴉 \c㴩 \c㵉 \c㵩 \c㶉 \c㶩 \c㷉 \c㷩 \c㸉 \c㸩 \c㹉 \c㹩 \c㺉 \c㺩 \c㻉 \c㻩 \c㼉 \c㼩 \c㽉 \c㽩 \c㾉 \c㾩 \c㿉 \c㿩 \c䀉 \c䀩 \c䁉 \c䁩 \c䂉 \c䂩 \c䃉 \c䃩 \c䄉 \c䄩 \c䅉 \c䅩 \c䆉 \c䆩 \c䇉 \c䇩 \c䈉 \c䈩 \c䉉 \c䉩 \c䊉 \c䊩 \c䋉 \c䋩 \c䌉 \c䌩 \c䍉 \c䍩 \c䎉 \c䎩 \c䏉 \c䏩 \c䐉 \c䐩 \c䑉 \c䑩 \c䒉 \c䒩 \c䓉 \c䓩 \c䔉 \c䔩 \c䕉 \c䕩 \c䖉 \c䖩 \c䗉 \c䗩 \c䘉 \c䘩 \c䙉 \c䙩 \c䚉 \c䚩 \c䛉 \c䛩 \c䜉 \c䜩 \c䝉 \c䝩 \c䞉 \c䞩 \c䟉 \c䟩 \c䠉 \c䠩 \c䡉 \c䡩 \c䢉 \c䢩 \c䣉 \c䣩 \c䤉 \c䤩 \c䥉 \c䥩 \c䦉 \c䦩 \c䧉 \c䧩 \c䨉 \c䨩 \c䩉 \c䩩 \c䪉 \c䪩 \c䫉 \c䫩 \c䬉 \c䬩 \c䭉 \c䭩 \c䮉 \c䮩 \c䯉 \c䯩 \c䰉 \c䰩 \c䱉 \c䱩 \c䲉 \c䲩 \c䳉 \c䳩 \c䴉 \c䴩 \c䵉 \c䵩 \c䶉 \c䶩 \c䷉ \c䷩ \c三 \c丩 \c义 \c乩 \c争 \c亩 \c仉 \c仩 \c伉 \c伩 \c佉 \c佩 \c侉 \c侩 \c俉 \c俩 \c倉 \c倩 \c偉 \c偩 \c傉 \c傩 \c僉 \c僩 \c儉 \c儩 \c光 \c兩 \c冉 \c冩 \c凉 \c凩 \c刉 \c利 \c剉 \c剩 \c劉 \c助 \c勉 \c勩 \c匉 \c匩 \c卉 \c卩 \c厉 \c厩 \c叉 \c叩 \c吉 \c吩 \c呉 \c呩 \c咉 \c咩 \c哉 \c哩 \c唉 \c唩 \c啉 \c啩 \c喉 \c喩 \c嗉 \c嗩 \c嘉 \c嘩 \c噉 \c噩 \c嚉 \c嚩 \c囉 \c囩 \c圉 \c圩 \c坉 \c坩 \c垉 \c垩 \c埉 \c埩 \c堉 \c堩 \c塉 \c塩 \c墉 \c墩 \c壉 \c壩 \c変 \c天 \c奉 \c奩 \c妉 \c妩 \c姉 \c姩 \c娉 \c娩 \c婉 \c婩 \c媉 \c媩 \c嫉 \c嫩 \c嬉 \c嬩 \c孉 \c孩 \c安 \c宩 \c寉 \c審 \c尉 \c尩 \c屉 \c屩 \c岉 \c岩 \c峉 \c峩 \c崉 \c崩 \c嵉 \c嵩 \c嶉 \c嶩 \c巉 \c巩 \c帉 \c帩 \c幉 \c幩 \c庉 \c庩 \c廉 \c廩 \c弉 \c弩 \c彉 \c彩 \c徉 \c復 \c忉 \c忩 \c怉 \c怩 \c恉 \c恩 \c悉 \c悩 \c惉 \c惩 \c愉 \c愩 \c慉 \c慩 \c憉 \c憩 \c應 \c懩 \c戉 \c戩 \c扉 \c扩 \c抉 \c抩 \c拉 \c择 \c按 \c挩 \c捉 \c捩 \c掉 \c掩 \c揉 \c揩 \c搉 \c搩 \c摉 \c摩 \c撉 \c撩 \c擉 \c擩 \c攉 \c攩 \c敉 \c敩 \c斉 \c斩 \c旉 \c早 \c昉 \c昩 \c晉 \c晩 \c暉 \c暩 \c曉 \c曩 \c有 \c朩 \c杉 \c杩 \c枉 \c枩 \c柉 \c柩 \c栉 \c栩 \c桉 \c桩 \c梉 \c梩 \c棉 \c棩 \c椉 \c椩 \c楉 \c楩 \c榉 \c榩 \c槉 \c槩 \c樉 \c権 \c橉 \c橩 \c檉 \c檩 \c櫉 \c櫩 \c欉 \c欩 \c歉 \c歩 \c殉 \c殩 \c毉 \c毩 \c氉 \c氩 \c汉 \c汩 \c沉 \c沩 \c泉 \c泩 \c洉 \c洩 \c浉 \c浩 \c涉 \c涩 \c淉 \c淩 \c渉 \c温 \c湉 \c湩 \c溉 \c溩 \c滉 \c滩 \c漉 \c漩 \c潉 \c潩 \c澉 \c澩 \c濉 \c濩 \c瀉 \c瀩 \c灉 \c灩 \c炉 \c炩 \c烉 \c烩 \c焉 \c焩 \c煉 \c煩 \c熉 \c熩 \c燉 \c燩 \c爉 \c爩 \c牉 \c物 \c犉 \c犩 \c狉 \c狩 \c猉 \c猩 \c獉 \c獩 \c玉 \c玩 \c珉 \c珩 \c琉 \c琩 \c瑉 \c瑩 \c璉 \c璩 \c瓉 \c瓩 \c甉 \c甩 \c畉 \c畩 \c疉 \c疩 \c痉 \c痩 \c瘉 \c瘩 \c癉 \c癩 \c皉 \c皩 \c盉 \c盩 \c眉 \c眩 \c睉 \c睩 \c瞉 \c瞩 \c矉 \c矩 \c砉 \c砩 \c硉 \c硩 \c碉 \c碩 \c磉 \c磩 \c礉 \c礩 \c祉 \c祩 \c禉 \c禩 \c秉 \c秩 \c稉 \c稩 \c穉 \c穩 \c窉 \c窩 \c竉 \c竩 \c笉 \c笩 \c等 \c筩 \c箉 \c箩 \c築 \c篩 \c簉 \c簩 \c籉 \c籩 \c粉 \c粩 \c糉 \c糩 \c紉 \c紩 \c絉 \c絩 \c綉 \c綩 \c緉 \c緩 \c縉 \c縩 \c繉 \c繩 \c纉 \c纩 \c绉 \c绩 \c缉 \c缩 \c罉 \c罩 \c羉 \c義 \c翉 \c翩 \c耉 \c耩 \c聉 \c聩 \c肉 \c肩 \c胉 \c胩 \c脉 \c脩 \c腉 \c腩 \c膉 \c膩 \c臉 \c臩 \c舉 \c舩 \c艉 \c艩 \c芉 \c芩 \c苉 \c苩 \c茉 \c茩 \c草 \c荩 \c莉 \c莩 \c菉 \c菩 \c萉 \c萩 \c葉 \c葩 \c蒉 \c蒩 \c蓉 \c蓩 \c蔉 \c蔩 \c蕉 \c蕩 \c薉 \c薩 \c藉 \c藩 \c蘉 \c蘩 \c虉 \c虩 \c蚉 \c蚩 \c蛉 \c蛩 \c蜉 \c蜩 \c蝉 \c蝩 \c螉 \c螩 \c蟉 \c蟩 \c蠉 \c蠩 \c衉 \c衩 \c袉 \c袩 \c裉 \c裩 \c褉 \c褩 \c襉 \c襩 \c覉 \c覩 \c觉 \c觩 \c訉 \c訩 \c詉 \c詩 \c誉 \c誩 \c諉 \c諩 \c謉 \c謩 \c證 \c譩 \c讉 \c让 \c诉 \c诩 \c谉 \c谩 \c豉 \c豩 \c貉 \c販 \c賉 \c賩 \c贉 \c贩 \c赉 \c赩 \c趉 \c趩 \c跉 \c跩 \c踉 \c踩 \c蹉 \c蹩 \c躉 \c躩 \c軉 \c軩 \c載 \c輩 \c轉 \c轩 \c辉 \c辩 \c迉 \c迩 \c选 \c逩 \c遉 \c適 \c邉 \c邩 \c郉 \c郩 \c鄉 \c鄩 \c酉 \c酩 \c醉 \c醩 \c釉 \c釩 \c鈉 \c鈩 \c鉉 \c鉩 \c銉 \c銩 \c鋉 \c鋩 \c錉 \c錩 \c鍉 \c鍩 \c鎉 \c鎩 \c鏉 \c鏩 \c鐉 \c鐩 \c鑉 \c鑩 \c钉 \c钩 \c铉 \c铩 \c锉 \c锩 \c镉 \c镩 \c閉 \c閩 \c闉 \c闩 \c阉 \c阩 \c陉 \c险 \c隉 \c隩 \c雉 \c雩 \c霉 \c霩 \c靉 \c革 \c鞉 \c鞩 \c韉 \c韩 \c頉 \c頩 \c顉 \c顩 \c颉 \c颩 \c飉 \c飩 \c餉 \c餩 \c饉 \c饩 \c馉 \c馩 \c駉 \c駩 \c騉 \c騩 \c驉 \c驩 \c骉 \c骩 \c髉 \c髩 \c鬉 \c鬩 \c魉 \c魩 \c鮉 \c鮩 \c鯉 \c鯩 \c鰉 \c鰩 \c鱉 \c鱩 \c鲉 \c鲩 \c鳉 \c鳩 \c鴉 \c鴩 \c鵉 \c鵩 \c鶉 \c鶩 \c鷉 \c鷩 \c鸉 \c鸩 \c鹉 \c鹩 \c麉 \c麩 \c黉 \c黩 \c鼉 \c鼩 \c齉 \c齩 \c龉 \c龩 \c鿉 \c鿩 \cꀉ \cꀩ \cꁉ \cꁩ \cꂉ \cꂩ \cꃉ \cꃩ \cꄉ \cꄩ \cꅉ \cꅩ \cꆉ \cꆩ \cꇉ \cꇩ \cꈉ \cꈩ \cꉉ \cꉩ \cꊉ \cꊩ \cꋉ \cꋩ \cꌉ \cꌩ \cꍉ \cꍩ \cꎉ \cꎩ \cꏉ \cꏩ \cꐉ \cꐩ \cꑉ \cꑩ \cꒉ \c꒩ \cꓩ \cꔉ \cꔩ \cꕉ \cꕩ \cꖉ \cꖩ \cꗉ \cꗩ \cꘉ \c꘩ \cꙉ \cꙩ \cꚉ \cꚩ \cꛉ \cꛩ \c꜉ \cꜩ \cꝉ \cꝩ \c꞉ \cꞩ \cꟉ \cꠉ \c꠩ \cꡉ \cꡩ \cꢉ \cꢩ \c꣩ \c꤉ \cꤩ \cꥉ \cꥩ \cꦉ \cꦩ \c꧉ \cꧩ \cꨉ \cꨩ \cꩉ \cꩩ \cꪉ \cꪩ \cꫩ \cꬉ \cꬩ \cꭉ \cꭩ \cꮉ \cꮩ \cꯉ \cꯩ \cퟩ \c契 \c朗 \c雷 \c數 \c黎 \c囹 \c柳 \c里 \c降 \c﨩 \c爫 \c響 \c憎 \c睊 \c韛 \c﬩ \cשּ \cﭩ \cﮉ \cﮩ \cﯩ \cﰉ \cﰩ \cﱉ \cﱩ \cﲉ \cﲩ \cﳉ \cﳩ \cﴉ \cﴩ \c﵉ \cﵩ \cﶉ \cﶩ \c︉ \c︩ \c﹉ \c﹩ \cﺉ \cﺩ \cﻉ \cﻩ \c) \cI \ci \cゥ \cノ \cᄅ \c←All of those are Unicode characters where the lower five bits are
01001.The intent behind
\c☒was that it would be used with\cior\cIasCtrl+Iis a tab. -
-
Hello, @mkupper, @peterjones and All,
I had a look to the part of the N++ documentation, regarding the way to find out the
C0 Control chars, mentioned by @mkupper !Remember that the Unicode
C0 Controlcharacters range is the range[\x00-\x1F], ONLY !
Regarding the
\c☒notation, theBoostregex engine follows the rules of the equivalence table , below :0020 0040 0060 0080 00A0 00C0 00E0 0100 0120 ... FF80 003F 005F 007F 009F 00BF 00DF 00FF 011F 013F ... FF9F \x00 = NUL ( NULL ) = \x00 = \c \c@ \c` \cPAD \c \cÀ \cà \cĀ \cĠ ... \cタ \x01 = SOH ( START of HEADER ) = \x01 = \c! \cA \ca \cHOP \c¡ \cÁ \cá \cā \cġ ... \cチ \x02 = STX ( START of TEXT ) = \x02 = \c" \cB \cb \cBHP \c¢ \c \câ \cĂ \cĢ ... \cツ \x03 = ETX ( END of TEXT ) = \x03 = \c# \cC \cc \cNBH \c£ \cà \cã \că \cģ ... \cテ \x04 = EOT ( END of TRANSMISSION ) = \x04 = \c$ \cD \cd \cIND \c¤ \cÄ \cä \cĄ \cĤ ... \cト \x05 = ENQ ( ENQUIREMENT ) = \x05 = \c% \cE \ce \cNEL \c¥ \cÅ \cå \cą \cĥ ... \cナ \x06 = ACK ( ACKNOWLEDGEMENT ) = \x06 = \c& \cF \cf \cSSA \c¦ \cÆ \cæ \cĆ \cĦ ... \cニ \x07 = BEL ( BELL ) = \x07 = \c' \cG \cg \cESA \c§ \cÇ \cç \cć \cħ ... \cヌ \x08 = BS ( BACK SPACE ) = \x08 = \c( \cH \ch \cHTS \c¨ \cÈ \cè \cĈ \cĨ ... \cネ \x09 = TAB ( HORIZONTAL TABULATION ) = \x09 = \c) \cI \ci \cHTJ \c© \cÉ \cé \cĉ \cĩ ... \cノ \x0A = LF ( LINE FEED ) = \x0A = \c* \cJ \cj \cVTS \cª \cÊ \cê \cĊ \cĪ ... \cハ \x0B = VT ( VERTICAL TABULATION ) = \x0B = \c+ \cK \ck \cPLD \c« \cË \cë \cċ \cī ... \cヒ \x0C = FF ( FORM FEED ) = \x0C = \c, \cL \cl \cPLU \c¬ \cÌ \cì \cČ \cĬ ... \cフ \x0D = CR ( CARRIAGE RETURN ) = \x0D = \c- \cM \cm \cRI \c \cÍ \cí \cč \cĭ ... \cヘ \x0E = SO ( SHIFT OUT ) = \x0E = \c. \cN \cn \cSS2 \c® \cÎ \cî \cĎ \cĮ ... \cホ \x0F = SI ( SHIFT iN ) = \x0F = \c/ \cO \co \cSS3 \c¯ \cÏ \cï \cď \cį ... \cマ \x10 = DLE ( DELETE ) = \x10 = \c0 \cP \cp \cDCS \c° \cÐ \cð \cĐ \cİ ... \cミ \x11 = DC1 ( DEVICE CONTROL 1 ) = \x11 = \c1 \cQ \cq \cPU1 \c± \cÑ \cñ \cđ \cı ... \cム \x12 = DC2 ( DEVICE CONTROL 2 ) = \x12 = \c2 \cR \cr \cPU2 \c² \cÒ \cò \cĒ \cIJ ... \cメ \x13 = DC3 ( DEVICE CONTROL 3 ) = \x13 = \c3 \cS \cs \cSTS \c³ \cÓ \có \cē \cij ... \cモ \x14 = DC4 ( DEVICE CONTROL 4 ) = \x14 = \c4 \cT \ct \cCCH \c´ \cÔ \cô \cĔ \cĴ ... \cヤ \x15 = NAK ( NEGATIVE ACKNOWLEDGEMENT ) = \x15 = \c5 \cU \cu \cMW \cµ \cÕ \cõ \cĕ \cĵ ... \cユ \x16 = SYN ( SYNCHRONISATION ) = \x16 = \c6 \cV \cv \cSPA \c¶ \cÖ \cö \cĖ \cĶ ... \cヨ \x17 = ETB ( END TRANSMISSION BLOCK ) = \x17 = \c7 \cW \cw \cEPA \c· \c× \c÷ \cė \cķ ... \cラ \x18 = CAN ( CANCEL ) = \x18 = \c8 \cX \cx \cSOS \c¸ \cØ \cø \cĘ \cĸ ... \cリ \x19 = EM ( END of MEDIUM ) = \x19 = \c9 \cY \cy \cSGCI \c¹ \cÙ \cù \cę \cĹ ... \cル \x1A = SUB ( SUBSTITUTION ) = \x1A = \c: \cZ \cz \cSCI \cº \cÚ \cú \cĚ \cĺ ... \cレ \x1B = ESC ( ESCAPE ) = \x1B = \c; \c[ \c{ \cCSI \c» \cÛ \cû \cě \cĻ ... \cロ \x1C = FS ( FILE SEPARATOR ) = \x1C = \c< \c\ \c| \cST \c¼ \cÜ \cü \cĜ \cļ ... \cワ \x1D = GS ( GROUP SEPARATOR ) = \x1D = \c= \c] \c} \cOSC \c½ \cÝ \cý \cĝ \cĽ ... \cン \x1E = RS ( RECORD SEPARATOR ) = \x1E = \c> \c^ \c~ \cPM \c¾ \cÞ \cþ \cĞ \cľ ... \c゙ \x1F = US ( UNIT SEPARATOR ) = \x1F = \c? \c_ \c \cAPC \c¿ \cß \cÿ \cğ \cĿ ... \c゚
-
Note that the values, under the
0080 - 009Fcolumn, represent the string\cfollowed with the true C1 Control char, in the range[\x80-\x9F] -
So, paradoxically, these
C1 Controlvalues may be used, also, to identify theC0 Controlcharacters !!
Thus, for example, if you want to search for any
SHIFT OUTcontrol char (), you can use any of these regexes :\x0E,\x{0E}or\x{000E}\c.\cN\cn\c\c®\cÎ\cî\cĎ\cĮ.........\cホ
So, Peter when you say that the search
\c1matches the SOH char (), it’s not exact. The\c1search do match the DC1 char () !And I confirm that any
\cstring, followed with a char outside the BMP ( so over\x{FFFF}), cannot be used to reach aC0 controlchar !Best Regards,
guy038
-
-
@guy038 said in Minor typo in the manual for regex control character \c☒:
So, Peter when you say that the search \c1 matches
I didn’t say that. Most of the Regex documentation was direct copy/paste from the original Wiki version that the Manual was derived from, including that original phrasing. (It had been edited over time, but the original version still had it described essentially the same)
I will fix it, but it wasn’t my mistake originally. (Given that
1and!are on the same key on US keyboards, whoever typed that line in the original Wiki probably just didn’t hold down the shift key while trying to type the correct\c!for the SOH).I will update the manual so it doesn’t use that example at all, and instead just keep the
\caand\cAversions, since those are the ones that are mnemonicly helpful. -
Hi, @mkupper, @peterjones, and All,
Yes, @peterjones, you’re right about it : The
\cAand\casyntaxes seem the only pertinent ones, in addition to the\x##notation too !BR
guy038
-
@guy038, @peterjones, and others.
It turns out the
\c☒topic gets fairly messy, and is far too messy to document the details in the manual. I started playing with ANSI…\c☒with ANSI or ASCII codes\x00to\x7Fworks well and searches for the lower five bits of the☒character. Realistically, you should only do it with A-Z or a-z. Better yet is to usex##orx{####}style expressions as it’s clearer as to what is being searched for.A case sensitive search for
\c☒using ANSI codes\x80to\xFFmatches ANSI codes in the\xE0to\xFFrange, with some exceptions… The logic first extracts the lower five bits of ☒ and then bitwise-or that with11100000or0xE0. For example, all of these will match ANSI character0xECwhich isì.Hex Pattern \x8C \cŒ \xAC \c¬ \xCC \cÌ \xEC \cì The lower five bits of the above hex codes ‘\x8C’, ‘\xAC’, ‘\xCC’, and ‘\xEC’ is
01100or\x0Cand we bitwise-or that result with11100000or0xE0to search for\xEC.It turns out that with one exception, all of the ANSI characters in the
\xE0to\xFFrange are lower case letters. A case-insensitive search for\c☒using ANSI codes \x80 to \xFF works just like the case-insensitive version I just described but also matches the upper case forms of the letters in\xE0to\xFFrange.The one exception is ANSI character code
\xF7which is a divide by sign÷. A search for\c—,\c·,\c×, or\c÷only matches÷when you use a case-insensitive search.Searching for
\c(\x20),\c@(\x40),\c`(\x60),\c€(\x80),\c(\xA0),\cÀ(\xC0), and\cà(\xE0) all matchNUL(\x00) in ANSI encoded files. With one exception also matchNUL(\x{0000}) in UTF-8 encoded files. The exception is searching for\c€(\x80) matches\x{000C} (form feed) and notNUL\x{0000}.Because searches for
\c€(\x80),\c(\xA0),\cÀ(\xC0), and\cà(\xE0) all matchNUL(\x00) in ANSI files it means you can’t use them to match the lower caseàat ANSI character\xE0nor it’s upper-caseÀat\xC0.
I also ran across that while Notepad++ supports searching for
\x00or\x{0000}both which match a NUL (\x00 or \x{0000}) in a file using\x00or\x{0000}in the replacement part both results in the replacement string getting terminated at the NUL (\x00 or \x{0000}) character.As replacement strings are terminated at the
NULusing\c~where the~is a NUL (\x00) returnsInvalid Regular Expressionwith the details being:ASCII escape sequence terminated prematurely. The error occurred while parsing the regular expression: '>>>HERE>>>\c'.Using a search for
xxxand replace ofaaa\x00zzzoraaa\x{0000}zzzboth result inxxxbeing replaced withaaaas the replacement string was terminated at theNUL. Apparently the engine first does a pass where it converted the\x☒☒and\x{☒☒☒☒}forms of characters into the actual character value meaning\x00or\x{0000}in a replacement simply terminates the string at that point.I suspect that bug could be used to add a comment to the replacement!
Search:Hello
Replace:World\x0 This will never happenWindows also use
NULas the text string terminator in its copy/paste system.