BUG: N++ does not keep in UTF8 unsaved open files
-
BUG,
I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh” and some cyrillic text is malformed, uninteligible and lost. After converting back to UTF8 the text remains to be malformed.
https://i.imgur.com/jt05fe5.png -
@dz15mlru said in BUG: N++ does not keep in UTF8 unsaved open files:
I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh” and some cyrillic text is malformed, uninteligible and lost.
First, would you open the ? menu, select Debug Into… and paste the information here? That helps make sure we know some details that can be important when analyzing bugs.
Second, at Settings | Preferences… | MISC. is the box labeled Autodetect character encoding checked? If it is, try unchecking it. That option sometimes does more harm than good.
in recent times
If there is any way you can remember or otherwise figure out what change(s) might have happened around the time this changed, it will help with working out what is happening.
After converting back to UTF8 the text remains to be malformed.
Once the text in the edit window is garbled, using one of the Encoding | Convert to options will never help. Those just convert what you’re already seeing to a different encoding.
I never use persistent unsaved files, so hopefully someone else will come along with experience about how to manage an unsaved file carried over from a previous session (I assume that’s the condition you’re describing) that opens in the wrong encoding. If it were a saved file that you were opening anew, the right thing to do would be to select the correct encoding from the top of the Encoding menu (not the Convert to options at the bottom) before making any changes. I don’t know if that works with persistent unsaved files, though.
-
@dz15mlru said in BUG: N++ does not keep in UTF8 unsaved open files:
BUG,
I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh”If you have a new UTF-8 file, the session file stores its encoding as “-1”, which I believe means it will use its auto-detect the next time around. And the auto-detection that Notepad++ uses is imperfect (as any encoding autodetection will be; this is explaied in the new “Encoding” section in the User Manual – but I just discovered that the manual stopped publishing updates a few day ago, so until it publishes, you can read the encoding description in the repo instead)
As @coises suggested while I was writing this up, try turning off the auto-detection, and it should prevent that in the future.
And the “Convert to…” won’t work to fix what you are seeing on your existing files… but maybe Encoding > UTF-8 will cause it to re-interpret the bytes correctly (assuming the bytes haven’t been re-written at this point to something else).
-
@Coises said in BUG: N++ does not keep in UTF8 unsaved open files:
If there is any way you can remember or otherwise figure out what change(s) might have happened around the time this changed, it will help with working out what is happening
Assuming autodetection is on (and that’s the best assumption, given the data), it depends on what other characters are also in the file, so if you get a combination of bytes that look to the algorithm like “Cyrillic -> Macintosh” instead of “UTF-8”, then it will pick that. So “in recent times” may have been that additional text was added to those files which make the algorithm think it looks like “Cyrillic -> Macintosh” should look.