Search code examples
shellunixbioinformaticsvcf-variant-call-format

extract all rows of specific column after comparing two files


I have two text files with thousands of rows. File A has only one column (ID)

#ID
rs111
rs222
rs333
rs444

File B looks like this:

#CHROM POS ID REF ALT QUAL ......

22 111 rs111 T C . ....

22 222 rs222 A G ....

22 333 rs666 G T ...

22 444 rs777 A A ..

This is the output I want:

#CHROM POS ID REF ALT QUAL ......

22 111 rs111 T C . ....

22 222 rs222 A G ....

i.e. I want to extract only those rows from file B whose ID matches the ID given in file A. How can I achieve this? thanks


Solution

  • You can use this awk:

    awk 'FNR==NR{a[$1];next} ($3 in a)' fileA fileB
    22 111 rs111 T C . ....
    22 222 rs222 A G ....