Community
    • Login

    Html files with Charset = iso: I don't wanna see the dicritics (accent marks ) with bold

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 868 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena CrainicuH Offline
      Hellena Crainicu
      last edited by

      I made a text parsing with Python from an old website to a new website. The old website has charset=iso-8859-1 and the new one has charset="utf-8"

      What is the best solution as not to see bold letters diacriitics (accent marks )? I try to change the charset="utf-8 to charset="utf-8" and viceversa. The same thing. Diacritics are further highlighted.

      enter image description here

      This is the code of the text in the image:

      <p class="text_obisnuit">&Icirc;ntr-o oarecare m&#259;sur&#259;, f&#259;r&#259; s&#259; &icirc;nt&#259;mpin vreo dificultate, dac&#259; mi-a&#351; m&#259;sura capacit&#259;&#355;ile de inventator prin experien&#355;a confrunt&#259;rii cu moartea, cu privire la modul de a schimba o constant&#259; a reprezent&#259;rii cuceririi &#351;tiin&#355;ei pe o traiectorie axiomatic&#259; unitar&#259;, atunci probabil m-a&#351; ciocni de o latur&#259; trecut&#259; cu vederea, o excep&#355;ie, ceva ce nu a&#351; fi g&acirc;ndit c&#259; pot face. O fi doar o problem&#259; de credin&#355;&#259; &#351;i de for interior?</p>

      <p class="text_obisnuit">Totu&#351;i, sunt un inventator-autodidact, &#351;i pe aceast&#259; cale sunt &icirc;ndrept&#259;&#355;it s&#259; accept modul de &icirc;ncadrare a inven&#355;iilor mele &icirc;n categoria celor ce nu se rostesc, dar se &icirc;nchipuiesc. F&#259;r&#259; excep&#355;ie, lucrul cu materia se poate transforma &icirc;ntr-o rela&#355;ie desf&#259;&#351;urat&#259; &icirc;ntre ceea ce proiectez ca inten&#539;ie, &#537;i efectivitatea protec&#355;iei pe care natura mi-o asigur&#259; cu un singur scop: pentru a-i l&#259;rgi valen&#355;ele de &ldquo;miracol&rdquo; &icirc;n afara materiei vizibile.</p>

      What should I do ?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP Offline
        PeterJones @Hellena Crainicu
        last edited by PeterJones

        @hellena-crainicu

        This forum is for Notepad++ questions. Your question has nothing to do with Notepad++: the answer will be the same whether you are using Notepad++, MS notepad.exe, or copy con. If you think “I am typing this with Notepad++, so it should be on topic,” then you haven’t read our FAQ which explains why that is a false interpretation, using the example of baking cookies.

        But I’ll give you a hint: on my machine, that HTML doesn’t display with bold characters:bec266e6-2703-494b-8566-8b69172090c7-image.png.
        (My guess is that it’s a font issue on your PC.)
        Further, the snippet you showed has no characters outside of the ASCII range, so it doesn’t matter whether you have set charset="iso-8859-1" or charset="utf-8". If you do understand why having no characters outside of the ASCII range necessarily implies the “so…” part of my previous sentence, you need to go find a better tutorial on character encodings and HTML, because you obviously don’t understand the technology you are working with sufficiently. If you still don’t understand, you will have to find a forum that’s about HTML and web formatting, not one for a particular editor, and ask there. The Notepad++ Community Forum is not the right place for further discussion on this.

        You can even use Notepad++ to prove to yourself that it doesn’t matter which charset you pick, given the data you showed:

        • FIND = [^\x20-\x7e\r\n] – this will find any character that is not between ASCII 32 (0x20) and ASCII 126 (0x7E), or not a CR or LF newline character.
        • COUNT

        In your snippet, it finds 0 characters outside of that range. That means there is nothing in that snippet which is not ASCII, and thus nothing that will be different between ISO-8859-1 and UTF-8.

        9427fef9-7617-4935-8594-aad16725f4a4-image.png

        OTOH, if I add the characters ÀÁÅËË and do the COUNT again, it now counts 5 matches in the file, for those five characters.

        00008f65-c2ff-45c9-bfa9-8afb14c07f06-image.png

        1 Reply Last reply Reply Quote 4

        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

        With your input, this post could be even better 💗

        Register Login
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors