Community
    • Login

    How to: Delete all lines in a .txt-document that occur in another .txt-document

    Scheduled Pinned Locked Moved General Discussion
    5 Posts 4 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Iskandar The PupsiI
      Iskandar The Pupsi
      last edited by

      Hello,

      dunno if this can be done easily but I will try to ask anyway:

      I have a list of words in a foreign language and there might be a few English words
      in there as well…these I want to delete.
      My idea was: I have already found a list of 5000 most common English words.
      So now my question: how can I delete those entries/lines in my list that also
      occur in the English language list?

      To make it clearer:
      i have a list called “foreign_language_vocab.txt” and a list “English_vocab.txt”.
      Now i want to delete all lines in “foriegn_language_vocab.txt” that have also occur in “English_vocab.txt”. Thank you for the help!

      Best,
      Iskandar

      1 Reply Last reply Reply Quote 0
      • dinkumoilD
        dinkumoil
        last edited by

        Is there always one word per line in both files? If yes, a while ago I wrote a script for the NppExec plugin that does exactly what you want.

        Which version of Notepad++ do you use? If it is a version prior to v7.6 you can install the NppExec plugin using Plugin Manager. If you use v7.6 you can use new build in Plugin Admin.

        When you managed to install the plugin come back to obtain further instructions.

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello, @iskandar-the-pupsi, @dinkumoil and All,

          Nothing is impossible with regular expressions ;-))

          So, in a new N++ tab ( Ctrl + N ) :

          • Copy all the contents of the foreign_language_vocab.txt file

          • Add a line of, at least, 3 tildes characters ( ~~~ )

          • Copy all the contents of the English_vocab.txt file

          Here is, below, an example, with a mix of French and English-American words, in the first part

          # foreign_language_vocab.txt
          table
          church
          poisson
          girl
          couteau
          maison
          orange
          town
          world
          day
          école
          garçon
          car
          lit
          plate
          voiture
          star
          ~~~~~~~~~~~~~~~~~~~~
          # English_vocab.txt
          table
          man
          church
          girl
          knife
          town
          fork
          world
          country
          car
          house
          plate
          road
          light
          hammer
          box
          paper
          book
          vegetable
          orange
          castle
          forest
          wood
          bed
          desk
          water
          glass
          cat
          farm
          

          Now :

          • Open the Replace dialog

          • SEARCH (?-s)^(.+)\R(?s)(?=.+^\1$)|~~~.+

          • REPLACE Leave EMPTY

          • Tick the Wrap around option

          • Select the Regular expression search mode

          • Click on the Replace All button

          Et voilà ;-)) You get the expected result, below :

          # foreign_language_vocab.txt
          poisson
          couteau
          maison
          day
          école
          garçon
          lit
          voiture
          star
          

          Remarks :

          • Data, in the two parts does not need to be sorted, first !

          • If a word has the same spelling in the two languages, it is removed ! ( case of words “table” and “orange” )

          • If a foreign word is not part of the English_vocab.txt file , it is not removed ( case of the remaining words “day” and “star” in the foreign_language_vocab.txt file )

          Best Regards

          guy038

          Scott SumnerS 1 Reply Last reply Reply Quote 2
          • Scott SumnerS
            Scott Sumner @guy038
            last edited by

            @guy038 said:

            Nothing is impossible with regular expressions

            There should be a qualifier: …unless your regular expression happens to select all the text in your document. :-)

            1 Reply Last reply Reply Quote 3
            • guy038G
              guy038
              last edited by guy038

              Hi, @scott-sumner and All,

              Note that I did not tell "Nothing is impossible with N++ regular expressions " ;-))

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 3
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors