Community
    • Login

    BUG: N++ does not keep in UTF8 unsaved open files

    Scheduled Pinned Locked Moved General Discussion
    bugcyrillicutf8 encoding
    5 Posts 3 Posters 38 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dz15mlruD
      dz15mlru
      last edited by dz15mlru

      BUG,
      I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh” and some cyrillic text is malformed, uninteligible and lost. After converting back to UTF8 the text remains to be malformed.
      https://i.imgur.com/jt05fe5.png

      CoisesC PeterJonesP 2 Replies Last reply Reply Quote 0
      • CoisesC
        Coises @dz15mlru
        last edited by

        @dz15mlru said in BUG: N++ does not keep in UTF8 unsaved open files:

        I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh” and some cyrillic text is malformed, uninteligible and lost.

        First, would you open the ? menu, select Debug Into… and paste the information here? That helps make sure we know some details that can be important when analyzing bugs.

        Second, at Settings | Preferences… | MISC. is the box labeled Autodetect character encoding checked? If it is, try unchecking it. That option sometimes does more harm than good.

        in recent times

        If there is any way you can remember or otherwise figure out what change(s) might have happened around the time this changed, it will help with working out what is happening.

        After converting back to UTF8 the text remains to be malformed.

        Once the text in the edit window is garbled, using one of the Encoding | Convert to options will never help. Those just convert what you’re already seeing to a different encoding.

        I never use persistent unsaved files, so hopefully someone else will come along with experience about how to manage an unsaved file carried over from a previous session (I assume that’s the condition you’re describing) that opens in the wrong encoding. If it were a saved file that you were opening anew, the right thing to do would be to select the correct encoding from the top of the Encoding menu (not the Convert to options at the bottom) before making any changes. I don’t know if that works with persistent unsaved files, though.

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @dz15mlru
          last edited by

          @dz15mlru said in BUG: N++ does not keep in UTF8 unsaved open files:

          BUG,
          I’m using N++ with a lot of unsaved open files and I have in settings the option that all new documents to be in UTF-8, but in recent times I’ve discovered that a number of my usaved open documents with content in Cyrillic are not kept in UTF8 by N++ and are converted in “Cyrillic -> Macintosh”

          If you have a new UTF-8 file, the session file stores its encoding as “-1”, which I believe means it will use its auto-detect the next time around. And the auto-detection that Notepad++ uses is imperfect (as any encoding autodetection will be; this is explaied in the new “Encoding” section in the User Manual – but I just discovered that the manual stopped publishing updates a few day ago, so until it publishes, you can read the encoding description in the repo instead)

          As @coises suggested while I was writing this up, try turning off the auto-detection, and it should prevent that in the future.

          And the “Convert to…” won’t work to fix what you are seeing on your existing files… but maybe Encoding > UTF-8 will cause it to re-interpret the bytes correctly (assuming the bytes haven’t been re-written at this point to something else).

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Coises
            last edited by

            @Coises said in BUG: N++ does not keep in UTF8 unsaved open files:

            If there is any way you can remember or otherwise figure out what change(s) might have happened around the time this changed, it will help with working out what is happening

            Assuming autodetection is on (and that’s the best assumption, given the data), it depends on what other characters are also in the file, so if you get a combination of bytes that look to the algorithm like “Cyrillic -> Macintosh” instead of “UTF-8”, then it will pick that. So “in recent times” may have been that additional text was added to those files which make the algorithm think it looks like “Cyrillic -> Macintosh” should look.

            CoisesC 1 Reply Last reply Reply Quote 0
            • CoisesC
              Coises @PeterJones
              last edited by

              @PeterJones said in BUG: N++ does not keep in UTF8 unsaved open files:

              Assuming autodetection is on (and that’s the best assumption, given the data), it depends on what other characters are also in the file, so if you get a combination of bytes that look to the algorithm like “Cyrillic -> Macintosh” instead of “UTF-8”, then it will pick that. So “in recent times” may have been that additional text was added to those files which make the algorithm think it looks like “Cyrillic -> Macintosh” should look.

              The thing is… it is very unusual for a UTF-8 file of any size that contains non-ASCII characters to “look like” anything but UTF-8. (Unless Cyrillic/Macintosh is some strange exception.) I suspect something is going on here that we haven’t heard about yet.

              One possibility might be if new files are set to open as UTF-8 but “Apply to opened ANSI files” is not checked, then the user exits when a file has only ASCII characters; on re-opening, perhaps (as I said, I don’t use Remember current session) Notepad++ opens it as ANSI, the user doesn’t notice and adds non-ASCII characters. Now it really would be in something other than UTF-8 — but why it would be mis-identified as the wrong Cyrillic code page, I don’t know.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors