linux unix join text-processing gnu-coreutils

Join 2 files by common column header (without awk/sed)

Basically I want to get all records from file2, but filter out columns whose header doesn't appear in file1

Example:

file1

Name Location

file2

Name Phone_Number Location Email
Jim  032131       xyz      xyz@qqq.com
Tim  037903       zzz      zzz@qqq.com
Pimp 039141       xxz      xxz@qqq.com

Output

Name Location
Jim  xyz
Tim  zzz
Pimp xxz

Is there a way to do this without awk or sed, but still using coreutils tools? I've tried doing it with join, but couldn't get it working.

Solution

ALL_COLUMNS=$(head -n1 file2)
for COLUMN in $(head -n1 file1); do
    JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
done
join -a2 -o ${JOIN_FORMAT%?} /dev/null file2

Explanation:

ALL_COLUMNS=$(head -n1 file2)

It saves all the column names to filter next

for COLUMN in $(head -n1 file1); do
    JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
done

For every column in file1, we look for the position of the one with the same name in file2 and append it to JOIN_FORMAT in the way of "2.<number_of_column>,"

join -a2 -o ${JOIN_FORMAT%?} /dev/null file2

Once we have the option string complete (2.1,2.3,), we pass it to join removing the last ,.

join prints the unpairable lines from the second file provided (-a2 -> file2), but only the columns specified in the -o option.