Search code examples
bashunixawkgawk

how to find out common columns and its records from two files using awk


I have two files:

File 1:

id|name|address|country
1|abc|efg|xyz
2|asd|dfg|uio

File 2(only headers):

id|name|country

Now, I want an output like:

OUTPUT:

id|name|country
1|abc|xyz
2|asd|uio

Basically, I have a user record file(file1) and a header file(file2).Now, I want to extract only those records from (file1) whose columns match with that in the header file.

I want to do this using awk or bash.

I tried using:

awk 'BEGIN { OFS="..."} FNR==NR { a[(FNR"")] = $0; next } { print a[(FNR"")], $0 > "test.txt"}' header.txt file.txt

and have no idea what to do next.

Thank You


Solution

  • Following awk may help you on same.

    awk -F"|" 'FNR==NR{for(i=1;i<=NF;i++){a[$i]};next} FNR==1 && FNR!=NR{for(j=1;j<=NF;j++){if($j in a){b[++p]=j}}} {for(o=1;o<=p;o++){printf("%s%s",$b[o],o==p?ORS:OFS)}}' OFS="|" File2  File1
    

    Adding a non-one liner form of solution too now.

    awk -F"|" '
    FNR==NR{
       for(i=1;i<=NF;i++){
         a[$i]};
       next}
    FNR==1 && FNR!=NR{
       for(j=1;j<=NF;j++){
         if($j in a){ b[++p]=j }}
    }
    {
       for(o=1;o<=p;o++){
         printf("%s%s",$b[o],o==p?ORS:OFS)}
    }
    ' OFS="|" File2  File1
    

    Edit by Ed Morton: FWIW here's the same script written with normal indenting/spacing and a couple of more meaningful variable names:

    BEGIN { FS=OFS="|" }
    NR==FNR {
        for (i=1; i<=NF; i++) {
            names[$i]
        }
        next
    }
    FNR==1 {
        for (i=1; i<=NF; i++) {
            if ($i in names) {
                f[++numFlds] = i
            }
        }
    }
    {
        for (i=1; i<=numFlds; i++) {
            printf "%s%s", $(f[i]), (i<numFlds ? OFS : ORS)
        }
    }