I have two CSV files. 1.csv
files has 718 entries and 2.csv
has 68000 entries.
#cat 1.csv
#Num #Name
1 BoB
2 Jack
3 John
4 Hawk
5 Scot
...........
#cat 2.csv
#Num #Name
1 BoB
2 John
3 Linda
4 Hawk
5 Scot
........
I knew how to compare two files,when only one column(Names
) is available in both and to get the matching names
.
#comm -12 <(sort 1.csv) <(sort 2.csv)
Now i would like to check, If Num
in 1.csv
is matching with Num
in 2.csv
, What is the associated "Names" from both the csv
files for that matched Num
?
Result :
1,Bob,Bob
2,Jack,John
3,John,Linda
4,Hawk,Hawk
5,Scot,Scot
..........
How to do achieve this using comm
?
You can use the join
command to perform inner join on 2 csv files on the 1st field i.e the number. Here is an example:
$ cat f1.csv
1 BoB
2 Jack
3 John
4 Hawk
5 Scot
6 ExtraInF1
$ cat f2.csv
1 BoB
3 Linda
4 Hawk
2 John
5 Scot
7 ExtraInF2
$ join <(sort -t ' ' -k 1 f1.csv) <(sort -t ' ' -k 1 f2.csv)
1 BoB BoB
2 Jack John
3 John Linda
4 Hawk Hawk
5 Scot Scot
$ join <(sort -t ' ' -k 1 f1.csv) <(sort -t ' ' -k 1 f2.csv) | tr -s ' ' ,
1,BoB,BoB
2,Jack,John
3,John,Linda
4,Hawk,Hawk
5,Scot,Scot
$
Note I have added few dummy rows(number 6 and 7) and also note that they haven't appeared in the output as they aren't present in both files.
<(sort -t ' ' -k 1 f1.csv)
means process substitution i.e substitute the output of the process at this place. sort
with delimiter as space(-t ' '
) and on 1st key i.e 1st column(-k 1
) and join
by default performs inner join on 1st column of both files.