Search code examples
awk

How to compare two columns of two CSV files with awk?


I have two CSV files I need to compare against one column.

My member.csv file looks like:

ID|lastName|firstName
01|Lastname01|Firstname01
02|Lastname02|Firstname02

The second file check-ID.csv looks like:

Lastname01|Name01|pubID01|Hash01
Lastname02|Name02|pubID02|Hash02a
Lastname03|Name03|pubID03|Hash03
Lastname02|Name02|pubID02|Hash02b
Lastname01|Name01|pubID01|Hash01b

Lastname03 is not in my member.csv!

What I want is to check if the value of the first column of check-ID.csv is equal to any value of the second column in member.csv.

My attempt with script.awk is

NR==FNR{a[$1]=$1; b[$1]=$0; next} 
$2==a[$1]{ delete b[$1]}

END{for (i in b ) print b[i]}

executing with

awk -f script.awk check-ID.csv member.csv

The problem is that the result is not filtered.

I like to get a filtered and sorted output so only members are listed like this:

Lastname01|Name01|pubID01|Hash01
Lastname01|Name01|pubID01|Hash01b
Lastname02|Name02|pubID02|Hash02a
Lastname02|Name02|pubID02|Hash02b

Solution

  • Could you please try following. I think you were close only thing is you could change your Input_files reading sequence. Where I am reading members Input_file first and then check-ID.csv because later Input_file has all details in it which needs to be printed and we need to only check for 2nd field from members Input_file.

    awk '
    BEGIN{
      FS="|"
    }
    FNR==NR{
      a[$2]
      next
    }
    ($1 in a)
    ' members.csv check-ID.csv | 
    sort -t'|' -k1
    

    Explanation: Adding detailed explanation for above.

    awk '                             ##Starting awk program from here.
    BEGIN{                            ##Starting BEGIN section of this program from here.
      FS="|"                          ##Setting field separator as | here.
    }
    FNR==NR{                          ##Checking condition if FNR==NR which will be TRUE when first Input_file named members.csv is being read.
      a[$2]                           ##Creating array a with index 2nd field here.
      next                            ##next will skip all further statements from here.
    }
    ($1 in a)                         ##Checking condition if 1st field is preent in a then print that line.
    ' members.csv check-ID.csv |      ##Mentioning Input_file names here and sending its output to sort command.
    sort -t'|' -k1                    ##Sorting output(which we got from awk command above) by setting separator as | and by first field.