Community
    • Login

    Regex: Add html tags in the lines that doesn't have html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 2 Posters 3.6k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR Offline
      Robin Cruise @Robin Cruise
      last edited by Robin Cruise

      This post is deleted!
      1 Reply Last reply Reply Quote 0
      • Robin CruiseR Offline
        Robin Cruise
        last edited by

        ok, so, I believe, I took a step forward. Seems t work.

        FIND: ^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

        REPLACE BY: <p class="mb-40px">\2</p>

        Now, I have to integrate this regex between section:
        <!-- START --> and <!-- FINAL -->

        I will use this generic formula:

        (?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX

        will become:

        FIND: (?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

        REPLACE: <p class="mb-40px">\2</p>

        In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion

        1 Reply Last reply Reply Quote 0
        • guy038G Offline
          guy038
          last edited by guy038

          Hello @robin-cruise and All,

          No need to use the generic formula !

          Here is my general method :

          • From beginning of current line, I try to find a line which does not contain :

            • A string <!-- START --> at any position of current line
              AND
            • A string <!-- FINAL --> at any position of current line
              AND
              (
            • A tag <p class="mb-40px"> at any position of current line
              OR
            • A tag </p> at any position of current line
              )
          • Then I select all characters, of current line, which come :

            • After a possible <p class="mb-40px"> tag

            • Before a possible </p> tag


          So, given this INPUT text, below, with 3 lines to change :

          <!-- START -->
          
          <p class="mb-40px">I may go to cinema</p>
          
          I need someone to take me home.
          
          <p class="mb-40px">I may go to cinema</p>
          
          I need someone to take me home.</p>
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.
          
          <p class="mb-40px">I can love you now</p>
          
          <!-- FINAL -->
          

          I use the following regex S/R :

          SEARCH (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

          REPLACE <p class="mb-40px">\1</p>

          And, after a click on the Replace All button, I get the expected OUTPUT text :

          <!-- START -->
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.</p>
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.</p>
          
          <p class="mb-40px">I may go to cinema</p>
          
          <p class="mb-40px">I need someone to take me home.</p>
          
          <p class="mb-40px">I can love you now</p>
          
          <!-- FINAL -->
          

          Notes :

          • First, after the usual modifiers, the boundaries which must not be matched (?!.*<!-- START -->)(?!.*<!-- FINAL -->)

          • Then, either, each tag which must not be matched, within a non-capturing group and the alternative (?:(?!.*<p class)|(?!.*</p>))

          • Now, after a possible (?:<p class="mb-40px">)?, in a non-capturing group, too, the regex select, either :

            • All chars before the </p> tag
              OR
            • All remaining chars of current line

          Remark :

          • Note the special syntax of this non-capturing group (?|(.+)</p>|(.+)). This allow to define all groups to the same level. Thus, you just need the <p class="mb-40px">\1</p> syntax in the replacement part

          • If I had used a normal non-capturing group (?:(.+)</p>|(.+)), two groups 1 and 2 would have been defined !. So the correct replacement regex would have been <p class="mb-40px">\1\2</p>, as these two groups are mutually exclusive !

          Best Regards,

          guy038

          Robin CruiseR 1 Reply Last reply Reply Quote 1
          • Robin CruiseR Offline
            Robin Cruise @guy038
            last edited by

            @guy038 super, thanks.

            what should be the generic regex in this case? (because I cannot figure the last part )

            (?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

            1 Reply Last reply Reply Quote 0
            • guy038G Offline
              guy038
              last edited by guy038

              Hi, @robin-cruise,

              You cannot use the generic regex, discussed in the topic :

              https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

              In order to solve your present goal. Why ?


              Well, because that genric regex suppose :

              • First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region

              • Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region


              But, in your present case, the INPUT lines to modify, like I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??

              Best Regards,

              guy038

              Robin CruiseR 1 Reply Last reply Reply Quote 1
              • Robin CruiseR Offline
                Robin Cruise @guy038
                last edited by Robin Cruise

                @guy038

                SEARCH: (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

                REPLACE: <p class="mb-40px">\1</p>

                Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.

                So, I need only to change between section <!-- START --> and <!-- FINAL -->

                <html lang="en">
                <head>
                  <!-- Meta Tags -->
                  <meta charset="utf-8"/>
                
                Script type="application/ld+json">
                {
                  "@context": "https://schema.org/", 
                  "@type": "Product", 
                  "name": "10 media farces of big days",
                  "image": "icon.jpg",
                  "description": "horses of Letea Delta Danube successfully saved,",
                  "brand": {
                    "@type": "Brand",
                    "name": "something"
                  },
                  "sku": "NFL",
                  "gtin8": "NFL",
                  "offers": {
                    "@type": "Offer",
                    "url": "https://something.html",
                    "priceCurrency": "RON",
                    "price": "0",
                    "priceValidUntil": "2022-02-15",
                    "availability": "https://schema.org/OnlineOnly"
                  },
                  "aggregateRating": {
                    "@type": "AggregateRating",
                    "ratingValue": "5",
                    "bestRating": "5",
                    "ratingCount": "6"
                  },
                  "review": {
                    "@type": "Review",
                    "reviewRating": {
                      "@type": "Rating",
                      "ratingValue": "5",
                      "bestRating": "5"
                    },
                    "author": {"@type": "Person", "name": "omehing"},
                    "publisher": {"@type": "Organization", "name": "omehing"}
                  }
                }
                </script>
                
                1 Reply Last reply Reply Quote 0
                • guy038G Offline
                  guy038
                  last edited by

                  Hi, @robin-cruise,

                  Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !

                  We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !

                  BR

                  guy038

                  Robin CruiseR 1 Reply Last reply Reply Quote 1
                  • Robin CruiseR Offline
                    Robin Cruise @guy038
                    last edited by

                    @guy038

                    yes, but also I cannot copy/paste the entire html page. It is a very large html code.

                    1 Reply Last reply Reply Quote 0
                    • guy038G Offline
                      guy038
                      last edited by guy038

                      Hi, @Robin-cruise

                      If you don’t mind, just send me your file by e-mail !

                      Here is my temporary mail address :

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • guy038G Offline
                        guy038
                        last edited by guy038

                        Hello @robin-cruise and All,

                        Ah… OK. Thanks for your attached HTML file with your mail. It’s always easier with a real example ;-))

                        Now, as you just have one <!-- ARTICOL START -->.......<!-- ARTICOL FINAL --> zone in your HTML file, the simple thing to do is :


                        • In search, to look for :

                          • Any char from the very start of file till the complete <!-- ARTICOL START --> line
                        • OR

                          • Any char from the <!-- ARTICOL FINAL --> line till the very end of your file
                        • OR ( Scan of lines between the <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> boundaries )

                          • A possible <p class="mb-40px"> tag, beginning the current line

                          • Followed with a single-line range of characters :

                            • Till a </p> tag, ending the current line
                          • OR

                            • Till the end of current line
                        • In replacement, to rewrite :

                          • ( If scan within the <!-- ARTICOL START -->.........<!-- ARTICOL FINAL --> zone, so when the group 2 is defined )

                            • First, the <p class="mb-40px"> tag, if absent in the INPUT file ( group 1 not defined )

                            • Then all the contents of current line ( $0 )

                            • And, finally, the </p> tag, if absent in the INPUT file ( group 3 not defined )

                        • OR

                          • The two ranges of chars, before the <!-- ARTICOL START -->, included and after the <!-- ARTICOL FINAL --> boundaries ( which occur when the group 2 is not defined )

                        For instance, from this INPUT file, below :

                        <!DOCTYPE html>
                        ....
                        bla bla
                        ....
                        blah bla
                        
                        <!-- ARTICOL START -->
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        I need someone to take me home.
                        
                        I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.
                        
                        <!-- ARTICOL FINAL -->
                        
                        bla bla
                        ....
                        blah bla
                        ....
                        </html>
                        

                        The following regex S/R :

                        SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$

                        REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                        Should give you the expected results :

                        <!DOCTYPE html>
                        ....
                        bla bla
                        ....
                        blah bla
                        
                        <!-- ARTICOL START -->
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <p class="mb-40px">I need someone to take me home.</p>
                        
                        <!-- ARTICOL FINAL -->
                        
                        bla bla
                        ....
                        blah bla
                        ....
                        </html>
                        

                        The message Replace All: 6 occurences were replaced is displayed in the status bar :

                        • One for the part between <!DOCTYPE html> and <!-- ARTICOL START -->
                        • One for each non-empty line between <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> ( 4 lines )
                        • One for the part between <!-- ARTICOL START --> and the very end of file

                        Note that this final solution does not neeed any look-ahead structure nor the \G syntax or other goodies !!

                        Best Regards,

                        guy038

                        Robin CruiseR 1 Reply Last reply Reply Quote 1
                        • Robin CruiseR Offline
                          Robin Cruise @guy038
                          last edited by

                          @guy038 said in Regex: Add html tags in the lines that doesn't have html tags:

                          SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
                          REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                          great answer, thank you @guy038

                          1 Reply Last reply Reply Quote 0

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors