I have two files A and B. I want to find all the lines in A that are not in B. What's the fastest way to do this in bash/using standard linux utilities? Here's what I tried so far:
for line in `cat file1`
do
if [ `grep -c "^$line$" file2` -eq 0]; then
echo $line
fi
done
It works, but it's slow. Is there a faster way of doing this?
The BashFAQ describes doing exactly this with comm, which is the canonically correct method.
# Subtraction of file1 from file2
# (i.e., only the lines unique to file2)
comm -13 <(sort file1) <(sort file2)
diff is less appropriate for this task, as it tries to operate on blocks rather than individual lines; as such, the algorithms it has to use are more complex and less memory-efficient.
comm has been part of the Single Unix Specification since SUS2 (1997).