Search code examples
unixtextdiffcomm

Compare 2 text files and output lines unique in only 1 file while ignoring case


I have 2 large text files file1.txt and file2.txt

Each file contains a line separated list of names e.g.

file1.txt

Beth
james
James
paul
Paul
sally

file2.txt

James
Paul
Sally

I want to produce a file containing names unique ONLY to file1.txt while also ignoring case, so in the above example I want a file produced which would look like this:

comparison.txt

Beth

Using the command comm -23 file1.txt file2.txt > comparison.txt produces an incorrect result:

Beth
james
paul
sally

Using the -i command also produces an incorrect result:

Beth
James
Paul

What am I missing here?


Solution

  • You can use Awk with POSIX compatible string-function tolower for doing a case insensitive lookup.

    awk 'FNR==NR{unique[tolower($0)]++; next}!(tolower($0) in unique)' file2.txt file1.txt
    Beth
    

    re-direct to a file comparison.txt as

    awk 'FNR==NR{unique[tolower($0)]++; next}!(tolower($0) in unique)' file2.txt file1.txt > comparison.txt
    cat comparison.txt
    Beth
    

    The idea behind the logic is

    So my understanding of the solution I did as follows,

    1. FNR==NR{unique[tolower($0)]++; next} will process on file2.txt storing the entries of the array as a case in-sensitive words till the end of file2.txt.
    2. Now on file1.txt, I can match those rows from other file by doing !(tolower($0) in unique) which will give me all those rows in file1.txt whose lines are not present in file2.txt

    (or) if you access to GNU grep using the negative match on the files, with -i for care insensitive look-up

    grep -viFxf file2.txt file1.txt
    Beth