Search code examples
awkcomparematchmismatch

Compare two files and append the values, leave the mismatches as such in the output file


I'm trying to match two files,file1.txt(50,000 lines), file2.txt(55,000 lines). I want to campare file2 to file 1 extract the values of column 2 and 3 and leave the mismatches as such. Output file must contain all the ids from file2 i.e., it should have 55000 lines. Note: All the ids in file 1 are not present in file2. i.e the actual matches could be less than 50,000.

file1.txt

ab1 12 345  
ab2 9 456  
gh67 6 987  

file2.txt

ab2 0 0  
ab1 0 345  
nh7 0 0  
gh67 6 987  

Output

ab2 9 456  
ab1 12 345  
nh7 0 0  
gh67 6 987 

This is what i tried but it only print the matches (so instead of 55,000 lines i have 49,000 lines in my output file)

awk "NR==FNR {f[$1]=$0;next}$1 in f{print f[$1],$0}" file1.txt file2.txt >output.txt

Solution

  • This awk script will work

    NR == FNR {
        a[$1] = $0
        next
    }
    $1 in a {
        split(a[$1], b)
        print $1, (b[2] == $2 ? $2 : b[2]), (b[3] == $3 ? $3 : b[3])
    }
    !($1 in a)
    

    If you save this as a.awk and run

    awk -f a.awk foo.txt foo1.txt
    

    This will output

    ab2 9 456
    ab1 12 345
    nh7 0 0
    gh67 6 987