Search code examples
awksplitfile-comparison

finding rows from file2 in file1 which have extended columns in file2


I have file1 as:

ABC CDEF HAGD CBDGCBAHS:ATSVHC
NBS JHA AUW MNDBE:BWJW
DKW QDW OIW KNDSK:WLKJW
BNSHW JBSS IJS BSHJA
ABC CDEF CBS 234:ATSVHC
DKW QDW FSD 634:WLKJW

and file2:

ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253
KAB GCBS YSTW SHSEB:AGTW:THE:193

I want to compare file 1 and file 2 based on column 1,2,3 and 4 except that column 4 in file2 has a bit of an extension to compare with, by using

awk 'FNR==NR{seen[$1,$2,$3,$4;next} ($1,$2,$3,$4) in seen' file1 file2

what can I tweak to make it comparable such that my output are the matched lines in file2 as:

ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253

Solution

  • Just include : in the FS:

    $ awk -F'[ :]' 'NR==FNR{a[$1,$2,$3,$4,$5];next} ($1,$2,$3,$4,$5) in a' file1 file2
    ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
    NBS JHA AUW MNDBE:BWJW:THE:243
    DKW QDW OIW KNDSK:WLKJW:THE:253