Here’s an AWK script that can do the trick for you:
# If there is something other than whitespace on a line: NF { # Use the text as an array index and count how many times it appears Line[$0]++ } # Once the whole file is done, spit out every line that was duplicated 2 or more # times, the number of times they were duplcated. # # If Line[line] == 1, then the line appeared only 1 time (it is unique). # If Line[line] > 1, then the line appeared that many times. END { for (line in Line) { for (i = 1; Line[line] > 1 && i <= Line[line]; i++) { print line } } }I use GNU AWK for windows (gawk.exe). If you save the script as dup.awk, then:
gawk -f .\dup.awk <name of your 90000 line file> > dupout.txtwill create dupout.txt with all the duplicated lines. I used the data in your original post and let the output go to standard out:
C:\temp\awk>type input.txt 919913209647 02:38:47 919979418778 02:57:03 918980055979 02:46:12 919428616318 02:46:32 919512672560 02:46:33 919512646084 02:46:52 919512497164 02:48:13 919512497164 02:48:13 919913029225 02:50:23 917567814941 03:02:35 919537722335 03:18:41 918980299814 03:24:49 919727009323 03:29:44 C:\temp\awk>gawk -f .\dup.awk input.txt 919512497164 02:48:13 919512497164 02:48:13 C:\temp\awk>