I have 2 large text files file1.txt
and file2.txt
Each file contains a line separated list of names e.g.
file1.txt
Beth
james
James
paul
Paul
sally
file2.txt
James
Paul
Sally
I want to produce a file containing names unique ONLY to file1.txt
while also ignoring case, so in the above example I want a file produced which would look like this:
comparison.txt
Beth
Using the command comm -23 file1.txt file2.txt > comparison.txt
produces an incorrect result:
Beth
james
paul
sally
Using the -i
command also produces an incorrect result:
Beth
James
Paul
What am I missing here?
You can use Awk
with POSIX
compatible string-function tolower
for doing a case insensitive lookup.
awk 'FNR==NR{unique[tolower($0)]++; next}!(tolower($0) in unique)' file2.txt file1.txt
Beth
re-direct to a file comparison.txt
as
awk 'FNR==NR{unique[tolower($0)]++; next}!(tolower($0) in unique)' file2.txt file1.txt > comparison.txt
cat comparison.txt
Beth
The idea behind the logic is
So my understanding of the solution I did as follows,
FNR==NR{unique[tolower($0)]++; next}
will process on file2.txt
storing the entries of the array as a case in-sensitive words till
the end of file2.txt
.file1.txt,
I can match those rows from other file by doing
!(tolower($0) in unique)
which will give me all those rows in
file1.txt
whose lines are not present in file2.txt
(or) if you access to GNU grep
using the negative match on the files, with -i
for care insensitive look-up
grep -viFxf file2.txt file1.txt
Beth