Community
    • Login

    Code page weird function; CP/Windows-1250

    Scheduled Pinned Locked Moved General Discussion
    conversionencoding
    5 Posts 2 Posters 4.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Uzivatel919U
      Uzivatel919
      last edited by

      I see function of code page that I don’t understand. Try this:

      1. Create new file.
      2. Write down some glyphs.
      3. Save file as arbitrary file with any extension.
      4. Change code page to Windows-1250 (Central European).
      5. Delete text and paste ◘○•🔶 ěščřžýáíéúůťďňÓ.
      6. Save, close and open file.
      7. You’ll get •0•?? ěščřžýáíéúůťďňÓ.
      8. Specify code page again.
      9. Still •0•?? ěščřžýáíéúůťďňÓ.

      Why glyphs are shown well before save if after re-open they are not?

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by

        Hello, @uzivatel919 and All

        Before trying to explain this N++ behavior, we need additional information :

        • At step 1, when you create your new file ( File > New ) or with the Ctrl+ N shortcut, what’s the current encoding of this new # file, before doing anything else ? I guess it should be Windows-1250. Am I right about it ?

        • At step 2, before saving file, did you type in pure ASCII glyphs, only ( with code in range [\x{0020}-\x{007f}] ) or did you add some accentuated characters ( as, for instance , č ř or ť )

        • At step 8 just to confirm : you meant “Specify Windows-1250 code-page, again”, didn’t you ?

        Best Regards

        guy038

        1 Reply Last reply Reply Quote 3
        • Uzivatel919U
          Uzivatel919
          last edited by

          1. Use Crtl + N as well as File/New. For me encoding is set to UTF-8 by default.
          2. Use arbitrary glyphs. It is just not allowed to save empty file.
          3. Yes, exactly choose Windows-1250 CP again.
          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, @uzivatel919 and All

            First, note that the tests, below, have been performed, whatever the option Autodetect character encoding was checked or not, in the dialog Settings > Preferences... > MISC

            Your method can be simplified to this first scenario, below :

            1. Create a new file ( Crtl + N )

            Note that the present encoding is UTF-8

            1. Select the option Encoding > Character Sets > Central European > Windows-1250

            2. Paste the text ◘○•🔶 ěščřžýáíéúůťďňÓ.

            => This text is encoded with the Windows-1250 encoding using 1 byte to describe each character Note that some of the graphical characters, which do not belong to the Windows-1250 encoding, are replaced, of course, with a question mark ? !

            https://en.wikipedia.org/wiki/Windows-1250

            1. Save the file, with, for instance, the name Test.txt

            2. Close Test.txt ( Ctrl + W )

            3. Re-open Test.txt ( Ctrl + Shift + T )

            => The letters of the text are correct but, as expected, some graphical characters are replaced with a ?. Moreover, Notepad++ detects the ANSI encoding , which is, indeed, quite equivalent to the Window-1250 encoding, used by your system, for all NON-Unicode files

            1. Select, again, the option Encoding > Character Sets > Central European > Windows-1250

            => As the encoding process just re-interprets all the 1-byte encoded characters, nothing has changed, as Windows-1250 encoding ≡≡ ANSI encoding. Note, that, as I’m French, on my system, for instance, there is the equivalence Windows-1252 encoding ≡≡ ANSI encoding


            Now, let’s imagine the second scenario, below :

            1. Create a new file ( Crtl + N )

            Note that the present encoding is UTF-8

            1. Paste the text ◘○•🔶 ěščřžýáíéúůťďňÓ.

            => This time, as we haven’t change current encoding yet, this text is, then, encoded with the UTF-8 encoding, using between 1 to 4 bytes to describe all the characters

            1. Select the option Encoding > Character Sets > Central European > Windows-1250

            2. Click on the Yes button of the small dialog Save Current Modification

            3. Choose, again, the name Test.txt and save the file

            => So, the encoding is changed to Windows-1250. But the encoding operation does NOT change the present contents of the file. Notepad++ just re-interprets all bytes of the file as it was a range of 1-byte encoded characters, of the Windows-1250 encoding => So, it’s obvious that all text seems rather incomprehensible !

            Thus, internally, the Test.txt file is still a suite of characters, each described according to the UTF-8 encoding

            1. Close Test.txt ( Ctrl + W )

            2. Re-open Test.txt ( Ctrl + Shift + T )

            => The text and most of the graphical chars are correct, according to your current font, and the UTF-8 encoding is automatically chosen ;-))

            Remarks :

            • As this text is an UTF-8 encoded, you may “test” any other character set, using Encoding > Character Sets > .... menu option

            • You’ll notice that during this test phase, the file contents are NOT modified at all and the icon of the file remains blue !

            • At the end, after that test phase, just select the option Encoding > Encode in UTF-8 to get the original text back ;-))


            Remember :

            • During an encoding operation, the present contents of the current file are, just, re-interpreted as they were encoded with this new encoding and are never modified

            • During a conversion operation, the present contents of the current file are modified, so that the new contents of the file correspond to the new encoding of the same characters

            In other words :

            • The option “Encode in ...” OR “Character sets/ ...” just read the present file contents, according to the new chosen encoding, giving, generally, a new representation of the current file contents

            • The option “Convert to ...” does modify the present file contents in order to be read, in an identical way, with the new chosen encoding.


            To end with :

            • As you see, encoding and conversion concepts are not easy to assimilate. So, I advice everyone to always use the UTF-8 encoding or, better, the UTF-8-BOM encoding, which is able to encode, absolutely, all the Unicode characters !

            • Of course, to fully exploit the UTF-8 files, your system must contain some fonts which cover most of Unicode characters and/or symbols !

            • For the record, as of today, 92.6% of Web pages are encoded in UTF-8-BOM;-)) Refer to the link, below :

            https://w3techs.com/technologies/history_overview/character_encoding/ms/y

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 4
            • Uzivatel919U
              Uzivatel919
              last edited by Uzivatel919

              Yes, yes. I know that things around. I was just surprised by given error since I was used to perfect Notepad++ code page functions. I did not realized at the moment what CP-1250 really includes.

              Btw, thanks.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors