Search code examples
bashawkgrepduplicatescompare

Remove only exact number of repeat matches between two files


I want to get the remaining difference between two files that have redundant entries.

File1.txt:

Data1
Data1
Data2
Data2
Data3
Data3
Data3
Data3
Data4
Data5
Data6
Data6

and

File2.txt:

Data1
Data2
Data2
Data3
Data3
Data4
Data5
Data6

Finalfile.txt:

Data1
Data3
Data3
Data6

In other words: if an entry shows up n times in file 1 and m times in file 2 then, the final file should contain the n-m entries. Ie: See there are four entries of Data3 in File1.txt and only two entries in File2.txt, therefore the Finalfile.txt has 2 occurances of Data3.

I've tried:

grep -v -f File1.txt File2.txt > Finalfile.txt

but it give the absolute differences.


Solution

  • You may use this 2 pass awk solution:

    awk '
    NR == FNR {
       ++fq[$1]
       next
    }
    {
       --fq[$1]
    }
    END {
       for (s in fq)
          for (i = 1; i <= fq[s]; ++i)
             print s
    }' file1 file2
    
    Data1
    Data3
    Data3
    Data6