Community
    • Login

    How to separate specific text with notepad?

    Scheduled Pinned Locked Moved General Discussion
    22 Posts 4 Posters 10.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Terry RT
      Terry R
      last edited by

      @yuly-pmem
      A question on what you want to achieve. From the example it would seem that once the text has been divided, you want all the first bits one after the other (line after line) in the file, then possibly a blank line followed by all the 2nd bits of each line, again one after the other (line after line). Is this correct?

      Otherwise I wonder if you mean
      literature:trouble
      18:trouble
      history:medicine
      10:medicine

      Both are possible with a regular expression (regex), one however is going to require more steps. I can see it maybe requiring 3 regex’s to achieve if the answer to my question is true.

      Terry

      1 Reply Last reply Reply Quote 1
      • yuly pmemY
        yuly pmem
        last edited by

        @Terry-R said:

        Both are possible with a regular expression (regex), one however is going to require more steps. I can see it maybe requiring 3 regex’s to achieve if the answer to my question is true

        Thanks for answering, friend I need to separate them, then I’ll copy them in two different files. they are not necessarily the same code

        1 Reply Last reply Reply Quote 0
        • Terry RT
          Terry R
          last edited by

          So if separated, do you care if the “first bits” or the “second bits” remain in the sequence they started with. If not then that would make it very simple.

          I’m thinking that the regex would add a number to the start of every second line (like 999999) and once all lines were divided, you’d sort the lines. That would put each of the groups in separate areas of the file.

          Terry

          1 Reply Last reply Reply Quote 0
          • yuly pmemY
            yuly pmem
            last edited by

            @yuly-pmem said:

            18:trouble
            10:medicine
            09:nature

            friend excuse me but I do not understand, there are some codes to separate them (regular expression):
            Find-what box:
            Replace-with box:
            Search mode: ☑ Regular expression
            I separate copy and paste in another file
            First file
            literature: trouble
            history: medicine
            algebra: nature
            Second file
            18: trouble
            10: medicine
            09: nature

            1 Reply Last reply Reply Quote 0
            • Terry RT
              Terry R
              last edited by

              I understand that the second group of lines already have a number at the start but in order to actually separate them from the first group of lines a sort would be needed. That’s going to change the order.

              So in effect you would have:
              algebra:nature
              history:medicine
              literature:trouble

              followed by the 2nd group which have the numbers at the start.

              My idea is as follows:
              the regex to transform 1 line into 2 lines is:
              Find what: ^(?i)([a-z]+?)(:)(\d{2}):([a-z]+?)(\R)
              replace with: \1\2\4\591919 \3\2\4\5

              So this would create (using your example):
              literature:trouble
              91919 18:trouble
              history:medicine
              91919 10:medicine
              algebra:nature
              91919 09:nature

              So now you would use the sort lines option, Edit, Line Operations, Sort lines lexicographically descending.

              This produces:
              literature:trouble
              history:medicine
              algebra:nature
              91919 18:trouble
              91919 10:medicine
              91919 09:nature

              Then another regex to remove the numbers.
              Find what: ^91919\h
              Replace with: empty field here

              So you’d finish up with:
              literature:trouble
              history:medicine
              algebra:nature
              18:trouble
              10:medicine
              09:nature

              As you can see it hasn’t affected the order of the numbered group, but it has changed the order of the first grouping.

              Terry

              1 Reply Last reply Reply Quote 1
              • Terry RT
                Terry R
                last edited by

                It is possible not to use the 91919 sequence, but as I wasn’t sure of the data typing I wanted something that was very unlikely to be elsewhere in your data to differentiate the 2nd part of each line as I made it.

                Terry

                1 Reply Last reply Reply Quote 0
                • Terry RT
                  Terry R
                  last edited by

                  Actually reading through the examples I seemed to have overlooked the example staying the same order. I think that was only luck as we used a reverse sort and the first group were already in that mode.

                  If however the sort changed the order to, say
                  algebra
                  history
                  literature

                  does that concern you?

                  Terry

                  1 Reply Last reply Reply Quote 0
                  • yuly pmemY
                    yuly pmem
                    last edited by

                    friend I found a regex for the first group I managed to separate with (:. *? :), now I would only miss the second part, know some regular expression

                    literature:18:trouble
                    history:10:medicine
                    algebra:09:nature

                    to

                    18:trouble
                    10:medicine
                    09:nature

                    1 Reply Last reply Reply Quote 0
                    • Terry RT
                      Terry R
                      last edited by Terry R

                      What say I start again as I may have confused you with lots of options.

                      Try the following on your original file.
                      Find what: ^(?i)([a-z]+?)(:)(\d{2}):([a-z]+?)(\R)
                      replace with: \1\2\4\5\3\2\4\5

                      Once this is done you use the sort function to group the 2 different line types apart. So Edit, Line Operations, Sort lines lexicographically descending.

                      See what that produces for you. If you are happy then just copy the 2nd group elsewhere (another file).

                      If the result is NOT what you wanted let us know and maybe someone can give you a different regex to achieve it.

                      Terry

                      1 Reply Last reply Reply Quote 1
                      • yuly pmemY
                        yuly pmem
                        last edited by

                        thank you friend for attending, I work well but for the 5 lines, but for 2000 or n lines?

                        1 Reply Last reply Reply Quote 0
                        • Terry RT
                          Terry R
                          last edited by Terry R

                          You use search mode regular expression and hit the “Replace All” button. It should change the entire file. Have wrap around ticked as well.

                          Terry

                          1 Reply Last reply Reply Quote 0
                          • yuly pmemY
                            yuly pmem
                            last edited by

                            I have about 2000 lines and I have marked regular mode expression and “Replace All” button, but it does not work

                            Note: replace with: \ 1 \ 2 \ 4 \ 5 \ 3 \ 2 \ 4 \ 5, is it only for 5 lines?

                            1 Reply Last reply Reply Quote 0
                            • Terry RT
                              Terry R
                              last edited by Terry R

                              Possibly the remainder of the lines do not fit the regex. Are the numbers 3 digits or more, my regex will only select 2 digit numbers as that’s what your example showed.

                              Where it says d{2}, change the 2 to 3. If numbers exceed 3 digits then change 2 to 2,4. You may even need to increase further the 4 to say 8, depending on the range of numbers you have.
                              Terry

                              1 Reply Last reply Reply Quote 0
                              • guy038G
                                guy038
                                last edited by guy038

                                Hello, @yuly-pmem, @terry-r and All,

                                I didn’t fully read all the posts, yet, but, personally, I would use the following method :

                                • Do 2 copies of your 200000-lines text

                                • Open the first copy in N++

                                • Open the Replace dialog ( Ctrl + H )

                                SEARCH :\d+

                                REPLACE Leave EMPTY

                                • Select the Regular expression search mode

                                • Tick the Wrap around option

                                • Click on the Replace All button

                                You should get the expected text :

                                literature:trouble
                                history:medicine
                                algebra:nature
                                
                                • Open the second copy, in N++

                                SEARCH (?-s)^.+?:(?=\d+)

                                REPLACE Leave EMPTY

                                This time, you should get the following text :

                                18:trouble
                                10:medicine
                                09:nature
                                

                                Note that I use a look-ahead structure, (?=\d+), just in case your text contains other lines ( as, for instance, Section 1: or Example 2: ) with a : symbol, not followed with digits !

                                Best Regards

                                guy038

                                1 Reply Last reply Reply Quote 2
                                • ani rodetA
                                  ani rodet
                                  last edited by

                                  friends guy038 and Terry the R: \ d + is for digits range 0-9, but if the case were like that

                                  freddy: letters@sout.com: darkkk12
                                  how would the method to separate them, with the previous method does not work

                                  1 Reply Last reply Reply Quote 0
                                  • yuly pmemY
                                    yuly pmem
                                    last edited by

                                    I dont know

                                    1 Reply Last reply Reply Quote 0
                                    • Terry RT
                                      Terry R
                                      last edited by

                                      @yuly-pmem are you able to tell us how you got on with the supplied regex’s? Have you tried any and if so what were the results.

                                      In order for us to help further we would need to know what you have tried, what didn’t work and also some more examples if a particular regex did NOT work as expected.

                                      @guy038 had a good idea. By copying the data, so you have 2 copies, you can create the individual groups you want independently. That also means once you have altered the text, it will still be in the same order as it started with. My idea would possibly have changed the order and that may not be what you wanted.

                                      Terry

                                      1 Reply Last reply Reply Quote 2
                                      • yuly pmemY
                                        yuly pmem
                                        last edited by

                                        friend terry, if he works when there are numbers (literature: 18: trouble
                                        history: 10: medicine), but in some lines there are only letters like this
                                        history: text: ready
                                        medicine: small: student
                                        thanks anyway friend for wanting to help me.
                                        I will continue looking for the solution
                                        attentively yuli

                                        1 Reply Last reply Reply Quote 0
                                        • Terry RT
                                          Terry R
                                          last edited by

                                          From your last example it would appear that your data can be described as:
                                          string#1 then a : (colon) then string#2 then a : (colon) then string#3
                                          And furthermore string#2 may be some digits.
                                          And you would like it to be
                                          string#1:string#3
                                          and
                                          string#2:string#3
                                          If the : is the delimiter then it should be easy enough to provide you a regex to change the data.

                                          First off, as @guy038 says, copy the entire file to another tab in Notepad++. So you should have 2 identical copies of the file (make sure the 2nd copy has a different file name as they need to be saved as different files).Add a blank line at the bottom of both files, so last line.

                                          In the 1st tab use the following regex to alter the text
                                          Find what: ^(.+?):.+?(:.+?\R)
                                          Replace with: \1\2
                                          search mode is “regular expression” and “wrap around” ticked.
                                          Once this is run you can remove the last blank line and save this file.

                                          In the 2nd tab (so this is the copy of the original file) use the regex:
                                          Find what: ^.+?:(.+?:.+?\R)
                                          Replace with : empty field here <— this means nothing in this field!
                                          search mode is “regular expression” and “wrap around” ticked.
                                          Once this is run you can remove the last blank line and save this file. Make sure this is a different file name, otherwise you will overwrite the results from the first regex.

                                          I hope this helps. My solution does rest on my description being accurate. If it is not then you need to provide it similar to how I did.

                                          Terry

                                          PS as you have found out, your original example wasn’t good enough for us to help you properly. My description, had you included that at the start would have provided the extra information needed to supply you with a good solution.

                                          1 Reply Last reply Reply Quote 2
                                          • guy038G
                                            guy038
                                            last edited by guy038

                                            Hello, @yuly-pmem, @ani-rodet, @terry-r and All,

                                            Ah OK ! So, here are, below, all the regexes to achieve the suppression of 1 or 2 columns, from an original 3-columns table, separated with colons ( : )

                                            Let’s imagine the initial 3-columns table, below :

                                            cell A1:cell B1:cell C1
                                            cell A2:cell B2:cell C2
                                            cell A3:cell B3:cell C3
                                            

                                            Then :

                                            • With the regex S/R :

                                            SEARCH :[^:\r\n]+$

                                            REPLACE Leave EMPTY

                                            Only columns A and B remain :

                                            cell A1:cell B1
                                            cell A2:cell B2
                                            cell A3:cell B3
                                            
                                            • With the regex S/R

                                            SEARCH (?-s):.+(?=:)

                                            REPLACE Leave EMPTY

                                            Only columns A and C remain :

                                            cell A1:cell C1
                                            cell A2:cell C2
                                            cell A3:cell C3
                                            
                                            • With the regex S/R :

                                            SEARCH (?-s)^.+?:(?=.+:)

                                            REPLACE Leave EMPTY

                                            Only columns B and C remain :

                                            cell B1:cell C1
                                            cell B2:cell C2
                                            cell B3:cell C3
                                            
                                            • With the regex S/R :

                                            SEARCH (?-s):.+$

                                            REPLACE Leave EMPTY

                                            Only column A remains :

                                            cell A1
                                            cell A2
                                            cell A3
                                            
                                            • With the regex S/R :

                                            SEARCH (?-s).+:(.+):.+

                                            REPLACE \1

                                            Only column B remains :

                                            cell B1
                                            cell B2
                                            cell B3
                                            
                                            • With the regex S/R :

                                            SEARCH (?-s)^.+:

                                            REPLACE Leave EMPTY

                                            Only column C remains :

                                            cell C1
                                            cell C2
                                            cell C3
                                            

                                            Cheers,

                                            guy038

                                            1 Reply Last reply Reply Quote 3
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors