Search code examples
joinawkcompareunion

Compare two files and make union


I tested a line below to compare 1st columns in 2 files and make an union. However the different value with identical 1st column in file2 was eliminated. Below I attached sample files, obtained result, and desired result.

awk -F, 'BEGIN{OFS=","}FNR==NR{a[$1]=$1","$2;next}($1 in a && $2=$2","a[$1])' file2.csv file1.csv >testout.txt

file1
John,red
John,blue
Mike,red
Mike,blue
Carl,red
Carl,blue

file2
John,V1
John,V2
Kent,V1
Kent,V2
Mike,V1
Mike,V2

obtained result
John,red,John,V2
John,blue,John,V2
Mike,red,Mike,V2
Mike,blue,Mike,V2

desired result
John,red,John,V1
John,red,John,V2
John,blue,John,V1
John,blue,John,V2
Mike,red,Kent,V1
Mike,red,Kent,V2
Mike,blue,Kent,V1
Mike,blue,Kent,V2

Solution

  • try this one-liner:

     awk -F, -v OFS="," 'NR==FNR{a[$0];next}{for(x in a)if(x~"^"$1FS)print $0,x}' file2 file1
    

    test:

    kent$  awk -F, -v OFS="," 'NR==FNR{a[$0];next}{for(x in a)if(x~"^"$1FS)print $0,x}' f2 f1
    John,red,John,V1
    John,red,John,V2
    John,blue,John,V1
    John,blue,John,V2
    Mike,red,Mike,V1
    Mike,red,Mike,V2
    Mike,blue,Mike,V1
    Mike,blue,Mike,V2