Notepad++ and NUL characters
-
A change is coming soon to Notepad++ which will allow proper handling of NUL characters when searching; see https://github.com/notepad-plus-plus/notepad-plus-plus/pull/16469 for a preview.
This got me wondering how well Notepad++ is positioned to be a “file editor” rather than simply a “text editor”.
Let’s discard for a moment that it doesn’t have native hex editor capabilities, and let’s pretend that arbitrary editing of files with encoded/binary content isn’t dangerous…
I’m specifically wondering how ready Notepad++ is to not have ANY stumbling blocks over the NUL character. Historically this has been a problem because a NUL character has been used internally by Notepad++ to signify that “the content of this string variable ends here”. Ideally, a NUL character is not treated that way, and is the same as any other character.
Does anyone interested in this conversation have examples of things you’d want to do with Notepad++ that you currently can’t do, because the NUL character gets in the way?
-
@Alan-Kilborn
Most plugin messages that select text from a file (e.g.,SCI_GETTEXT
,SCI_GETSELTEXT
) still treat the text as a NUL-terminated string. -
@Mark-Olson said in Notepad++ and NUL characters:
Most plugin messages that select text from a file (e.g.,
SCI_GETTEXT
,SCI_GETSELTEXT
) still treat the text as a NUL-terminated string.Indeed, but I think the ultimate limiting factor is the Win32 API, which is basically ANSI C (properly speaking, it’s the C++98 standard, which is so old as to no longer even resemble C++ as most developers understand it today).
The
NULL
byte will always have special significance as long as you are passing “string” data to C library functions, given that C has no dedicated “string” type, the closest thing being an array ofchar
, and such arrays have no way of storing their length — as, for example, a Pascal string can ¹ — except the presence of aNULL
byte to mark the end.
¹
The old Macintosh operating system used Pascal strings everywhere. Many C programmers on other platforms used Pascal strings for speed. Excel uses Pascal strings internally which is why strings in many places in Excel are limited to 255 bytes, and it’s also one reason Excel is blazingly fast.
For a long time, if you wanted to put a Pascal string literal in your C code, you had to write:
char* str = "\006Hello!";
-
I have no idea what additional possibilities I could have now if this comes, Npp is and remains a pure text editor for me.
I’m a bit surprised that this is being implemented because, from my point of view, this signals that you can also edit files that you can search accordingly and that’s just not the case. But it is as it is or as it will be.