# comm -12 /tmp/src /tmp/txt | wc -l
10338
# join /tmp/src /tmp/txt | wc -l
10355
Both the files are single columns of alphanumeric strings and sort
-ed. Shouldn't they be the same?
Updated following @Kevin-s answer below:
cat /tmp/txt | sed 's/^[:space:]*//' > /tmp/stxt
cat /tmp/src | sed 's/^[:space:]*//' > /tmp/ssrc
and the result:
#join /tmp/ssrc /tmp/stxt | wc -l
516
# comm -12 /tmp/ssrc /tmp/stxt | wc -l
513
On manual inspection of the diff
-s ... the results differ due to some whitespaces that were not taken out by the sed
.
I haven't used either extensively, but from a quick look at the man pages and test input, it seems that if the two files differ, comm prints both and join only prints matching lines. The -12 took care of that. You could store the output of the two into files and do a diff to see how they differ.
$ echo -e '1\n2\n3\n5' > a
$ echo -e '1\n2\n4\n5' > b
$ comm a b
1
2
3
4
5
$ join a b
1
2
5
$
Edit: Join only compares the first whitespace-separated field but comm compares the whole line. Any whitespace on the line will therefore make the output differ.