I have 2 data tables as shown (they are 2 x tab-delimited files). I'm trying to populate the Table-2 Country column with the corresponding country from Table-1. Need to "join" from info in Table-2's Firstname field.
What's the best approach here, given the complexity of the data in the Table-2, Firstname column? Would other Mac tools be better to use than AWK e.g. Excel formulae, Perl, Filemaker etc?
TABLE1 (Input):
city_ascii country iso2
Mavinga Angola AO
Menongue Angola AO
Mucusso Angola AO
Guines Cuba CU
Havana Cuba CU
Holguin Cuba CU
Las Tunas Cuba CU
Manzanillo Cuba CU
Matanzas Cuba CU
Moron Cuba CU
Santa Clara Cuba CU
Varadero Cuba CU
TABLE2 (Input):
Firstname
Fred, Havana
James, (Varadero, Cuba)
Jack (Cuba)
Harry Varadero, Cuba
Josh Cuba
Gary, Mavinga & Other, Angola
Jamie, (Angola)
TABLE2 (Result):
Firstname Country
Fred, Havana Cuba
James, (Varadero, Cuba) Cuba
Jack (Cuba) Cuba
Harry Varadero, Cuba Cuba
Josh Cuba Cuba
Gary, Mavinga & Other, Angola Angola
Jamie, (Angola) Angola
============ Here is debugging info in answer to Ed's Qs below:
awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table3.txt | cat -v
1<city_ascii country iso2><><>
1<Mavinga Angola AO><><>
1<Menongue Angola AO><><>
1<Mucusso Angola AO><><>
1<Guines Cuba CU><><>
1<Havana Cuba CU><><>
1<Holguin Cuba CU><><>
1<Las Tunas Cuba CU><><>
1<Manzanillo Cuba CU><><>
1<Matanzas Cuba CU><><>
1<Moron Cuba CU><><>
1<Santa Clara Cuba CU><><>
1<Varadero Cuba CU><><>
==============
awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table4.txt | cat -v
1<Firstname><><>
1<Fred, Havana><><>
1<James, (Varadero, Cuba)><><>
1<Jack (Cuba)><><>
1<Harry Varadero, Cuba><><>
1<Josh Cuba><><>
1<Gary, Mavinga & Other, Angola><><>
1<Jamie, (Angola)><><>
===============
cat -v tst.awk
BEGIN { FS=OFS="\t" }
NR==FNR {
map[$1] = $2
map[$2] = $2
next
}
FNR==1 {
print
FS=" "
next
}
{
orig = $0
country = ""
gsub(/[^[:alpha:]]/," ")
for (i=NF; i>0; i--) {
if ($i in map) {
country = map[$i]
break
}
}
print orig, country
}
===============
awk -f tst.awk Table3.txt Table4.txt >output.txt
Firstname
Fred, Havana
James, (Varadero, Cuba)
Jack (Cuba)
Harry Varadero, Cuba
Josh Cuba
Gary, Mavinga & Other, Angola
Jamie, (Angola)
================
awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' output.txt | cat -v
1<Firstname><><>
2<Fred, Havana><><>
2<James, (Varadero, Cuba)><><>
2<Jack (Cuba)><><>
2<Harry Varadero, Cuba><><>
2<Josh Cuba><><>
2<Gary, Mavinga & Other, Angola><><>
2<Jamie, (Angola)><><>
It sounds like this might be what you're looking for:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==FNR {
map[$1] = $2
map[$2] = $2
next
}
FNR==1 {
print
FS=" "
next
}
{
orig = $0
country = ""
gsub(/[^[:alpha:]]/," ")
for (i=NF; i>0; i--) {
if ($i in map) {
country = map[$i]
break
}
}
print orig, country
}
$ awk -f tst.awk file1 file2
Firstname Country
Fred, Havana Cuba
James, (Varadero, Cuba) Cuba
Jack (Cuba) Cuba
Harry Varadero, Cuba Cuba
Josh Cuba Cuba
Gary, Mavinga & Other, Angola Angola
Jamie, (Angola) Angola