Community
    • Login

    Regex: Add html tags in the lines that doesn't have html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 2 Posters 3.6k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR Offline
      Robin Cruise
      last edited by

      I have this paragraph. Also, I have a line I need someone to take me home. that doesn’t have html tags. So, I need to find this line (not others) and frame it between tags

      <!-- START -->
      
      <p class="mb-40px">I may go to cinema</p>
      
      I need someone to take me home.
      
      <p class="mb-40px">I can love you now</p>
      
      <!-- FINAL -->
      

      OUTPUT:

      <!-- START -->
      
      <p class="mb-40px">I may go to cinema</p>
      
      <p class="mb-40px">I need someone to take me home.</p>
      
      <p class="mb-40px">I can love you now</p>
      
      <!-- FINAL -->
      

      I don’t know why my regex doesn’t work.

      FIND: ^(?!<p class="mb-40px">)(.*?)((?!</p>).)*$

      REPLACE: <p class="mb-40px">\2\</p>

      Robin CruiseR 1 Reply Last reply Reply Quote 0
      • Robin CruiseR Offline
        Robin Cruise @Robin Cruise
        last edited by Robin Cruise

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • Robin CruiseR Offline
          Robin Cruise
          last edited by

          ok, so, I believe, I took a step forward. Seems t work.

          FIND: ^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

          REPLACE BY: <p class="mb-40px">\2</p>

          Now, I have to integrate this regex between section:
          <!-- START --> and <!-- FINAL -->

          I will use this generic formula:

          (?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX

          will become:

          FIND: (?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$

          REPLACE: <p class="mb-40px">\2</p>

          In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by guy038

            Hello @robin-cruise and All,

            No need to use the generic formula !

            Here is my general method :

            • From beginning of current line, I try to find a line which does not contain :

              • A string <!-- START --> at any position of current line
                AND
              • A string <!-- FINAL --> at any position of current line
                AND
                (
              • A tag <p class="mb-40px"> at any position of current line
                OR
              • A tag </p> at any position of current line
                )
            • Then I select all characters, of current line, which come :

              • After a possible <p class="mb-40px"> tag

              • Before a possible </p> tag


            So, given this INPUT text, below, with 3 lines to change :

            <!-- START -->
            
            <p class="mb-40px">I may go to cinema</p>
            
            I need someone to take me home.
            
            <p class="mb-40px">I may go to cinema</p>
            
            I need someone to take me home.</p>
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.
            
            <p class="mb-40px">I can love you now</p>
            
            <!-- FINAL -->
            

            I use the following regex S/R :

            SEARCH (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

            REPLACE <p class="mb-40px">\1</p>

            And, after a click on the Replace All button, I get the expected OUTPUT text :

            <!-- START -->
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.</p>
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.</p>
            
            <p class="mb-40px">I may go to cinema</p>
            
            <p class="mb-40px">I need someone to take me home.</p>
            
            <p class="mb-40px">I can love you now</p>
            
            <!-- FINAL -->
            

            Notes :

            • First, after the usual modifiers, the boundaries which must not be matched (?!.*<!-- START -->)(?!.*<!-- FINAL -->)

            • Then, either, each tag which must not be matched, within a non-capturing group and the alternative (?:(?!.*<p class)|(?!.*</p>))

            • Now, after a possible (?:<p class="mb-40px">)?, in a non-capturing group, too, the regex select, either :

              • All chars before the </p> tag
                OR
              • All remaining chars of current line

            Remark :

            • Note the special syntax of this non-capturing group (?|(.+)</p>|(.+)). This allow to define all groups to the same level. Thus, you just need the <p class="mb-40px">\1</p> syntax in the replacement part

            • If I had used a normal non-capturing group (?:(.+)</p>|(.+)), two groups 1 and 2 would have been defined !. So the correct replacement regex would have been <p class="mb-40px">\1\2</p>, as these two groups are mutually exclusive !

            Best Regards,

            guy038

            Robin CruiseR 1 Reply Last reply Reply Quote 1
            • Robin CruiseR Offline
              Robin Cruise @guy038
              last edited by

              @guy038 super, thanks.

              what should be the generic regex in this case? (because I cannot figure the last part )

              (?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

              1 Reply Last reply Reply Quote 0
              • guy038G Offline
                guy038
                last edited by guy038

                Hi, @robin-cruise,

                You cannot use the generic regex, discussed in the topic :

                https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

                In order to solve your present goal. Why ?


                Well, because that genric regex suppose :

                • First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region

                • Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a \K feature, match the FR region


                But, in your present case, the INPUT lines to modify, like I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??

                Best Regards,

                guy038

                Robin CruiseR 1 Reply Last reply Reply Quote 1
                • Robin CruiseR Offline
                  Robin Cruise @guy038
                  last edited by Robin Cruise

                  @guy038

                  SEARCH: (?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))

                  REPLACE: <p class="mb-40px">\1</p>

                  Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.

                  So, I need only to change between section <!-- START --> and <!-- FINAL -->

                  <html lang="en">
                  <head>
                    <!-- Meta Tags -->
                    <meta charset="utf-8"/>
                  
                  Script type="application/ld+json">
                  {
                    "@context": "https://schema.org/", 
                    "@type": "Product", 
                    "name": "10 media farces of big days",
                    "image": "icon.jpg",
                    "description": "horses of Letea Delta Danube successfully saved,",
                    "brand": {
                      "@type": "Brand",
                      "name": "something"
                    },
                    "sku": "NFL",
                    "gtin8": "NFL",
                    "offers": {
                      "@type": "Offer",
                      "url": "https://something.html",
                      "priceCurrency": "RON",
                      "price": "0",
                      "priceValidUntil": "2022-02-15",
                      "availability": "https://schema.org/OnlineOnly"
                    },
                    "aggregateRating": {
                      "@type": "AggregateRating",
                      "ratingValue": "5",
                      "bestRating": "5",
                      "ratingCount": "6"
                    },
                    "review": {
                      "@type": "Review",
                      "reviewRating": {
                        "@type": "Rating",
                        "ratingValue": "5",
                        "bestRating": "5"
                      },
                      "author": {"@type": "Person", "name": "omehing"},
                      "publisher": {"@type": "Organization", "name": "omehing"}
                    }
                  }
                  </script>
                  
                  1 Reply Last reply Reply Quote 0
                  • guy038G Offline
                    guy038
                    last edited by

                    Hi, @robin-cruise,

                    Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !

                    We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !

                    BR

                    guy038

                    Robin CruiseR 1 Reply Last reply Reply Quote 1
                    • Robin CruiseR Offline
                      Robin Cruise @guy038
                      last edited by

                      @guy038

                      yes, but also I cannot copy/paste the entire html page. It is a very large html code.

                      1 Reply Last reply Reply Quote 0
                      • guy038G Offline
                        guy038
                        last edited by guy038

                        Hi, @Robin-cruise

                        If you don’t mind, just send me your file by e-mail !

                        Here is my temporary mail address :

                        BR

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • guy038G Offline
                          guy038
                          last edited by guy038

                          Hello @robin-cruise and All,

                          Ah… OK. Thanks for your attached HTML file with your mail. It’s always easier with a real example ;-))

                          Now, as you just have one <!-- ARTICOL START -->.......<!-- ARTICOL FINAL --> zone in your HTML file, the simple thing to do is :


                          • In search, to look for :

                            • Any char from the very start of file till the complete <!-- ARTICOL START --> line
                          • OR

                            • Any char from the <!-- ARTICOL FINAL --> line till the very end of your file
                          • OR ( Scan of lines between the <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> boundaries )

                            • A possible <p class="mb-40px"> tag, beginning the current line

                            • Followed with a single-line range of characters :

                              • Till a </p> tag, ending the current line
                            • OR

                              • Till the end of current line
                          • In replacement, to rewrite :

                            • ( If scan within the <!-- ARTICOL START -->.........<!-- ARTICOL FINAL --> zone, so when the group 2 is defined )

                              • First, the <p class="mb-40px"> tag, if absent in the INPUT file ( group 1 not defined )

                              • Then all the contents of current line ( $0 )

                              • And, finally, the </p> tag, if absent in the INPUT file ( group 3 not defined )

                          • OR

                            • The two ranges of chars, before the <!-- ARTICOL START -->, included and after the <!-- ARTICOL FINAL --> boundaries ( which occur when the group 2 is not defined )

                          For instance, from this INPUT file, below :

                          <!DOCTYPE html>
                          ....
                          bla bla
                          ....
                          blah bla
                          
                          <!-- ARTICOL START -->
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          I need someone to take me home.
                          
                          I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.
                          
                          <!-- ARTICOL FINAL -->
                          
                          bla bla
                          ....
                          blah bla
                          ....
                          </html>
                          

                          The following regex S/R :

                          SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$

                          REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                          Should give you the expected results :

                          <!DOCTYPE html>
                          ....
                          bla bla
                          ....
                          blah bla
                          
                          <!-- ARTICOL START -->
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <p class="mb-40px">I need someone to take me home.</p>
                          
                          <!-- ARTICOL FINAL -->
                          
                          bla bla
                          ....
                          blah bla
                          ....
                          </html>
                          

                          The message Replace All: 6 occurences were replaced is displayed in the status bar :

                          • One for the part between <!DOCTYPE html> and <!-- ARTICOL START -->
                          • One for each non-empty line between <!-- ARTICOL START --> and <!-- ARTICOL FINAL --> ( 4 lines )
                          • One for the part between <!-- ARTICOL START --> and the very end of file

                          Note that this final solution does not neeed any look-ahead structure nor the \G syntax or other goodies !!

                          Best Regards,

                          guy038

                          Robin CruiseR 1 Reply Last reply Reply Quote 1
                          • Robin CruiseR Offline
                            Robin Cruise @guy038
                            last edited by

                            @guy038 said in Regex: Add html tags in the lines that doesn't have html tags:

                            SEARCH (?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
                            REPLACE ?2(?1:<p class="mb-40px">)$0(?3:</p>):$0

                            great answer, thank you @guy038

                            1 Reply Last reply Reply Quote 0

                            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                            With your input, this post could be even better 💗

                            Register Login
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors