Community
    • Login

    Find/Replace & deleting

    Scheduled Pinned Locked Moved General Discussion
    8 Posts 4 Posters 2.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Dave DyetD
      Dave Dyet
      last edited by

      Hi
      I am new to Notepad++. I tried using this URL https://techbrij.com/copy-extract-html-drop-down-list-options-text but Notepad++ returned “bad command” maybe its the wrong syntax?

      I am looking to delete everything except Country names?

      Here is part of the file… thanks for the help! dd

      <li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1806” class=“ui-corner-all” tabindex=“-1”>Africa</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1807” class=“ui-corner-all” tabindex=“-1”>Angola</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1808” class=“ui-corner-all” tabindex=“-1”>Argentina</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1809” class=“ui-corner-all” tabindex=“-1”>Armenia</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1810” class=“ui-corner-all” tabindex=“-1”>Asia</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1811” class=“ui-corner-all” tabindex=“-1”>Australia</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1812” class=“ui-corner-all” tabindex=“-1”>Australia > New South Wales</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1813” class=“ui-corner-all” tabindex=“-1”>Australia > Northern Territory</a></li><li class=“ui-menu-item” role=“presentation”><a id=“ui-id-1814” class=“ui-corner-all” tabindex=“-1”>

      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones
        last edited by

        @Dave-Dyet said:

        I tried using this URL https://techbrij.com/copy-extract-html-drop-down-list-options-text but Notepad++ returned “bad command”

        How are you using a URL as a command inside Notepad++?

        If you mean you tried following the instructions listed at that URL, where specifically in the sequence does it say “bad command” for you?

        1 Reply Last reply Reply Quote 0
        • Dave DyetD
          Dave Dyet
          last edited by

          @PeterJones said:

          ried following the instructions listed at that URL

          Hi
          yes your correct, I mean tried following the instructions listed at that URL for Notepad++.

          I tried it again with the same syntax <option[^>]>([^<])</option> … etc… now it says “0 occurrences were replaced”

          i am trying to figure out how do I strip all the data out, leaving the country names behind and returning the names left justified. ie.

          Africa
          Angola
          Argentina
          Armenia

          Thanks for the help!

          1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R
            last edited by Terry R

            @Dave-Dyet said:

            leaving the country names behind and returning the names left justified.

            Hi Dave, it would appear from your example that the country names are the only text/words that start with a capital letter. If this is indeed correct we can use that to our advantage. My suggestion for a regex is:
            Find What:(?-i)[^A-Z]+(\z|[A-Z].+?(?=</a))
            Replace With:\1\r\n
            Search mode MUST be regular expression.

            The (?-i) at the start means this will be a case-sensitive search. We first look for any character as long as it’s NOT a capital letter. Once the capital letter is found we start capturing until we see just in the front the </a combination, depicting the end of the country name. The use of the \z is so once the last country name has been found, we continue until the end of the file and drop all those characters.

            A quick test on your example seemed to work. Please note I assumed you want the state within the country also captured, thus Northern Territory is also captured.

            Let us know how it went, if a problem arises we can possibly alter the regex if you provide the situation where it did NOT work.

            Terry

            1 Reply Last reply Reply Quote 2
            • PeterJonesP
              PeterJones
              last edited by

              @Dave-Dyet said:

              <option[^>]>([^<])</option> … etc… now it says “0 occurrences were replaced”

              Even assuming you used the actual <option[^>]*>([^<]*)</option> regex that they suggested rather than what we see in the forum (which was likely mangled by the forum when you pasted it in because you didn’t use Markdown formatting [see boilerplate below]): how did you expect that regex to work given your data? Their example was for HTML using the <option> tags, and extracting the values from there. Your example text had everything in <li> tag pairs… why would you think the work “option” would magically match “li”? Also, your data has <a> tags nested in the <li> tags.

              Since you seem to just want to delete all tags in the example you provided, I’d probably do a two-step:

              1. FIND = <[^>]*>, REPLACE = \n, MODE = regular expression – this will get rid of all the tags, but there are extra newlines
              2. FIND = \R+, REPLACE = \n, MODE = regular expression – this will collapse all series of multiple newlines down into a single newline.

              (Note: the original regex and mine both assume you want linux-style LF newlines \n rather than windows-style CRLF newlines \r\n)

              -----
              Boilerplate to help you with formatting:

              This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

              1 Reply Last reply Reply Quote 1
              • PeterJonesP
                PeterJones
                last edited by

                @Terry-R said:

                (?-i)[^A-Z]+(\z|[A-Z].+?(?=</a))

                Ah, Terry beat me by a couple of minutes, and was able to find the alternation that would allow it in a single go rather than in two-pass. Use that one instead.

                1 Reply Last reply Reply Quote 2
                • Dave DyetD
                  Dave Dyet
                  last edited by

                  Thanks so much for the answers and explanation it was really helpful. I used Terry’s response worked great!

                  Cheers
                  Dave

                  1 Reply Last reply Reply Quote 2
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @dave-dyet, @terry-r, @peterjones and All,

                    Here is a 3rd possible solution :

                    SEARCH (?s-i).+?(\u.+?)(?=<)|(?s).+

                    REPLACE ?1\1\r\n ( OR ?1\1\n for Unix files )

                    Notes :

                    • The remainder of text, near the very end of file, is just wiped out. Indeed, when the second alternative (?s).+ is used, the group 1 does not exist. So, no replacement is done, because of the conditional replacement ?1....

                    • I used the \u syntax which matches, when sensitive search is processed, any uppercase letter of any occidental Unicode script ( Latin, Greek, Cyrillic,… ). It’s probably useless, as in English/American language, no country begins with an accentuated character, anyway ! However, regarding this specific case, writing (?-i)\u is as easy as writing (?-i)[A-Z] ! Refer to the list of sovereign states, below :

                    https://en.wikipedia.org/wiki/List_of_sovereign_states

                    And we get the text, below :

                    Africa
                    Angola
                    Argentina
                    Armenia
                    Asia
                    Australia
                    Australia > New South Wales
                    Australia > Northern Territory
                    

                    Peter, from your solutiion, I built a new version, which can do all the job, in one go ;-)) So, here is the 4th version :

                    SEARCH (?-s)<.+?>|^\h*\R?|(.+?)(?=<)

                    REPLACE ?1\1\r\n ( OR ?1\1\n for Unix files )

                    Notes :

                    • This regex allows the pertinent items to begin with an lowercase letter, either !

                    • If group 1 does not exist, then the <.....> blocks OR possible leading blank chars, followed with a possible line-break, are deleted

                    • If group1 does exist, then the different items of the drop-down list, are listed, as usual, one per line

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors