Basically I want to get all records from file2
, but filter out columns whose header doesn't appear in file1
Example:
file1
Name Location
file2
Name Phone_Number Location Email
Jim 032131 xyz xyz@qqq.com
Tim 037903 zzz zzz@qqq.com
Pimp 039141 xxz xxz@qqq.com
Output
Name Location
Jim xyz
Tim zzz
Pimp xxz
Is there a way to do this without awk
or sed
, but still using coreutils
tools? I've tried doing it with join
, but couldn't get it working.
ALL_COLUMNS=$(head -n1 file2)
for COLUMN in $(head -n1 file1); do
JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
done
join -a2 -o ${JOIN_FORMAT%?} /dev/null file2
ALL_COLUMNS=$(head -n1 file2)
It saves all the column names to filter next
for COLUMN in $(head -n1 file1); do
JOIN_FORMAT+="2.$(( $(echo ${ALL_COLUMNS%%$COLUMN*} | wc -w)+1 )),"
done
For every column in file1
, we look for the position of the one with the same name in file2
and append it to JOIN_FORMAT
in the way of "2.<number_of_column>,"
join -a2 -o ${JOIN_FORMAT%?} /dev/null file2
Once we have the option string complete (2.1,2.3,
), we pass it to join
removing the last ,
.
join
prints the unpairable lines from the second file provided (-a2 -> file2
), but only the columns specified in the -o
option.