Search code examples
bashsortingoptimizationsedcomm

How to remove common lines between two files without sorting?


I have two files not sortered which have some lines in common.

file1.txt

Z
B
A
H
L

file2.txt

S
L
W
Q
A

The way I'm using to remove common lines is the following:

sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt

comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt

Output:

B
H
Z

The problem is that I want to keep the order of file1.txt, I mean:

Desired output:

Z
B
H

One solution I tought is doing a loop to read all the lines of file2.txt and:

sed -i '/^${line_file2}$/d' file1.txt

But if files are big the performance may suck.

  • Do you like my idea?
  • Do you have any alternative to do it?

Solution

  • grep or awk:

    awk 'NR==FNR{a[$0]=1;next}!a[$0]' file2 file1