Search code examples
linuxunixjointext-processinggnu-coreutils

Join 2 files by common column header (without awk/sed)


Basically I want to get all records from file2, but filter out columns whose header doesn't appear in file1

Example:

file1

Name Location

file2

Name Phone_Number Location Email
Jim  032131       xyz      xyz@qqq.com
Tim  037903       zzz      zzz@qqq.com
Pimp 039141       xxz      xxz@qqq.com

Output

Name Location
Jim  xyz
Tim  zzz
Pimp xxz

Is there a way to do this without awk or sed, but still using coreutils tools? I've tried doing it with join, but couldn't get it working.


Solution

  • ALL_COLUMNS=$(head -n1 file2)
    for COLUMN in $(head -n1 file1); do
        JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
    done
    join -a2 -o ${JOIN_FORMAT%?} /dev/null file2
    

    Explanation:

    ALL_COLUMNS=$(head -n1 file2)
    

    It saves all the column names to filter next


    for COLUMN in $(head -n1 file1); do
        JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
    done
    

    For every column in file1, we look for the position of the one with the same name in file2 and append it to JOIN_FORMAT in the way of "2.<number_of_column>,"


    join -a2 -o ${JOIN_FORMAT%?} /dev/null file2
    

    Once we have the option string complete (2.1,2.3,), we pass it to join removing the last ,.

    join prints the unpairable lines from the second file provided (-a2 -> file2), but only the columns specified in the -o option.