I have a multiple files that I want to merge by column using shell script, let's say file a.txt and file b.txt. File a.txt contains sorted unique value and the first column will be used as a reference value.
example :
# cat a.txt
001|johan
002|mike
003|adam
# cat b.txt
001|chu
001|stewart
002|lewis
002|jordan
003|lambert
003|johnson
003|smith
003|long
The both files will be combined to produce an output like below.
# cat c.txt
001|johan chu
001|johan stewart
002|mike lewis
002|mike jordan
003|adam lambert
003|adam johnson
003|adam smith
003|adam long
I tried simple iteration using while do
.
while read line
do
ids=`echo $line | awk -F"|" '{print $1}'`
fn=`grep $ids a.txt`
echo $fn"|"$line | awk -F"|" '{print $1"|"$2" "$4}'
done < b.txt > c.txt
But it takes a lot of time if I have million number of rows.
In MySQL
we can easily achieve it using JOIN
clause. But we need to load/insert them first.
Faster approach is probably using paste
command, but as far as I know, the total rows of both files must be equal. Well I can adjust a.txt first. But still comsumes a lot of time when the script running.
Maybe someone has a better approach.
You could put all in one awk
script:
awk -F'|' '{if (NR==FNR) a[$1]=$2; else print $1 "|" a[$1] " " $2}' a.txt b.txt
001|johan chu
001|johan stewart
002|mike lewis
002|mike jordan
003|adam lambert
003|adam johnson
003|adam smith
003|adam long