Community
    • Login

    Find matching word between two text file

    Scheduled Pinned Locked Moved General Discussion
    6 Posts 3 Posters 5.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Frederick SmithF
      Frederick Smith
      last edited by

      Hi guys, I need some help here.
      Let say I’ve 2 text file each with 100’s of lines.
      Text file A and B.
      I want to match all words in A to B … so any words in A if find in B highlighted.
      Example: lat say in file A - have a word: CAR… in file B - have a word: CARPOOL…
      It would match the word: CAR - and highlight it. (so only CAR would be highlighted - not: word: CARPOOL.
      Or … all matching words to be saved to a new file…either would be great.
      So file A being the "source… match any/all words from file A to file B (if exist)
      I tried Compare - but it’s show the difference… I would need the match.
      Thanks for your help in advance.
      Frederick

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by Terry R

        Hi @Frederick Smith
        your problem was interesting. It was very similar in nature to a solution presented by @guy038, namely:
        https://notepad-plus-plus.org/community/topic/16335/multiline-replace-multiple-hosts-in-hostsfile
        In that instance the question was how to remove lines when duplicates found. In essence though the search method here works very close to that one.

        I’m going to assume that the file A contents is 1 word per line, if not then we need file A in that format (When you copy lines you ONLY want the word which is duplicated, not additional words on the same line). So you would need file A opened first, then put a “—” line at the bottom, make it the last line. Then below it add file B.

        Open the Mark function and use the following:
        Find What: (?is)^(.+)\R(?=.*---.*\1)
        You need search mode set to regular expression (very important) and wrap around ticked. Also tick Bookmark Lines, this will help later.

        Have the cursor set at the top left most position of the file, so top of file A contents, otherwise the result will be unpredictable. You will only need to click on the Mark All button once. Any of the file A contents which also appear in file B area (below the — line) will be marked and also the line will be bookmarked (blue circle in the margin). The — line stops attempts to find duplicates in file B area.

        Now use the “Search” menu option, select “Bookmark”, then “Copy Bookmarked Lines”. Put the copied lines elsewhere, which is what you requested.

        My regex includes the (?is) modifier, s means CRLF (carriage return line feed) character is treated like ALL other characters, i means do an insensitive search. Insensitive means “CAR” would also find “car”, “Car”, “cAr” etc.

        I hope this helps, otherwise come back with more info including samples of actual file A and B contents if you can.

        Terry

        1 Reply Last reply Reply Quote 4
        • Frederick SmithF
          Frederick Smith
          last edited by

          Hi Terry,
          Thanks a lot for taking the time and responding to my question.
          First - you’re correct an your assumption.
          ALMOST THERE…
          First I tried, didn’t work, - then looking at your function code - realized it calls for: “—” (3) not “-” , so once I changed that it WORKED!
          With one exception!
          The only one thing is that it Marks the file A part - not file B part -
          (and I would need file B part to be marked)

          • I tried flipping around the files., but that didn’t work.

          This is not a real files…just a sample to illustrate…

          This is file A:
          car
          apple
          beach
          hello
          down
          sun
          question

          This is file B:
          city
          whatever
          carpool
          san
          beachcity
          cornel
          downpillow

          I opened FileA - and made to this:
          car
          apple
          beach
          hello
          down
          sun
          question

          city
          whatever
          carpool
          san
          beachcity
          cornel
          downpillow

          So,instead mark: car, beach down
          Would need mark: carpool, beachcity, downpillow
          So “car” would be highlighted in: “carpool”

          So how to change the “Mark” function to do that result?

          Thanks again Terry!

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by

            Hello, @frederick-Smith; @terry-r and All,

            Of course, with your additional information, it becomes easier to point out the suitable regex ! I hope that Terry won’t mind if I reply to you, first ;-))


            Actually, you have two files : File_A which contains a list of strings, which, possibly, are subsets of some words contained in the File_B list !

            Then, we’re going to reverse the logic :

            • First, in a new N++ tab, copy/paste the File_B.txt contents

            • Add the single line ---

            • Then, under this line, insert the File_A contents

            • Open the Mark dialog

            • Use the regex search :

            (?si)(.+)(?=.*^---\R.*^\1$)

            • Preferably, tick the Purge for each search option

            • Click on the Mark All button


            So, given File_B contents, below :

            city
            whatever
            carpool
            san
            beachcity
            cornel
            downpillow
            

            and File_A contents, below :

            car
            cornel
            apple
            beach
            hello
            ever
            down
            sun
            it
            question
            

            Just note that I added 3 words ever, cornel and it, in order to show that “subset-words” can be marked, also, in middle or at end of the whole word or that the entire word can be highlighted !

            Now, we add, in a new tab, the following text :

            city
            whatever
            carpool
            san
            beachcity
            cornel
            downpillow
            ---
            car
            cornel
            apple
            beach
            hello
            ever
            down
            sun
            it
            question
            

            Finally, using the Mark dialog and the regex (?si)(.+)(?=.*^---\R.*^\1$), it should higlight the bold words, below :-))

            city
            whatever
            carpool
            san
            beachcity
            cornel
            downpillow

            Notes :

            • As usual, the (?si) modifiers mean an insensitive to case search and that any dot ( . ) will match any single character ( Standard and EOL )

            • Then, the main part (.+) try to match the longest, non-null, amount of characters, even in several lines, stored as group 1, but ONLY IF the positive look-around (?=.*^---\R.*^\1$) is TRUE. That is to say, IF it detects :

              • A range of any character, possibly empty, .* ,

              • followed with a line with, only, 3 dashes and its line-break, ^---\R ,

              • followed, again, with the longest range, possibly null, of any character, .* ,

              • and ended with the contents of group 1, alone on its line, ^\1$


            Remark : if you prefer a sensitive to case search, simply use the first part (?s-i), instead !

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 5
            • Terry RT
              Terry R
              last edited by

              @Frederick-Smith said:

              It would match the word: CAR - and highlight it. (so only CAR would be highlighted - not: word: CARPOOL.

              I interpreted that as being the word in file A being highlighted, so what you really meant was the letters CAR in carpool would be highlighted as CAR also existed in file A. Sorry about that and the confusion over the 3 “-”, sometimes characters don’t show well, it’s the interpreter (behind the compose window) that causes most of the issues. As @guy038 has given you another solution to fit your requirements I’ll let it be.

              Be sure to come back if anyone that elaborate, or help further.

              Terry

              1 Reply Last reply Reply Quote 1
              • Frederick SmithF
                Frederick Smith
                last edited by

                Hi @terry-r, @guy038 and All

                First I want to thank you both: @terry-r and @guy038 - for taking your time and giving me help.

                Both solution works - maybe a bit different - but both gives the good results what I was looking for.

                Let me say, how much I appreciate the community. Thanks you!

                Thanks again guys!

                1 Reply Last reply Reply Quote 1
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors