I have the following file called file1:
j_2,2019-04-01,100.00,,,,11
j_7,2019-04-05,-9.78,,,,11
j_1,2019-04-01,20.00,,,,22
j_4,2019-04-04,130.00,,,,22
j_8,2019-04-05,-250.00,,,,22
j_1,2019-04-01,100.00,,,,33
j_3,2019-04-01,-264.56,,,,33
j_5,2019-04-04,130.00,,,,44
j_6,2019-04-04,-12.90,,,,55
j_9,2019-04-08,100.00,,,,66
and file2
11,Accounting,VE
22,Paint,VE
33,Legal,VE
44,Office,VE
55,Maintenance,VE
66,IT,VE
I want to join file1 (on column 7) and file2 (on column 1) to create file3.
j_2,2019-04-01,100.00,,,,Accounting,VE
j_7,2019-04-05,-9.78,,,,Accounting,VE
j_1,2019-04-01,20.00,,,,Paint,VE
j_4,2019-04-04,130.00,,,,Paint,VE
j_8,2019-04-05,-250.00,,,,Paint,VE
j_1,2019-04-01,100.00,,,,Legal,VE
j_3,2019-04-01,-264.56,,,,Legal,VE
j_5,2019-04-04,130.00,,,,Office,VE
j_6,2019-04-04,-12.90,,,,Maintenance,VE
j_9,2019-04-08,100.00,,,,IT,VE
I have tried
join -t ',' -1 7 -2 1 file1 file2 | head -n 2
which results in
11,j_2,2019-04-01,100.00,,,,Accounting,VE
11,j_7,2019-04-05,-9.78,,,,Accounting,VE
Why does a 11, appear in the front and how can I run the join command to get the required result. Furthermore, I also want to be able to do this in awk. I have looked on the internet for solutions and found this but don't understand it fully to apply it to my context. Also would I be able to pass file2 and file1 as variables as I really want to be using this as part of a bash script.
awk 'BEGIN {FS=OFS=","} NR==FNR {h[$1] = $2; next} {print $0,h[$1]}' file2 file1
Help is always much appreciated!
Answering OP's question about why the join column is showing up at the beginning of the line ...
If no explicit format (-o FORMAT
) is provided then join
uses a default ouput format (from man join
):
Default FORMAT outputs the join field, (then) the remaining fields from FILE1, (then) the remaining fields from FILE2
OP will need to provide an explicit format which can get tedious if there are several columns (join
does not provide any sort of wildcard/range/regex designators). One example
join -t',' -1 7 -2 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 file1 file2
Another issue which doesn't show up in OP's sample head -2
output, but is mentioned in the ccomments, is that join
requires both files to be sorted by the join column.
In my system OP's join
comand generates the following error strewn output:
$ join -t ',' -1 7 -2 1 file1 file2
22,j_1,2019-04-01,20.00,,,,Paint,VE
join: file1:3: is not sorted: j_2,2019-04-01,100.00,,,,11
33,j_1,2019-04-01,100.00,,,,Legal,VE
44,j_5,2019-04-04,130.00,,,,Office,VE
55,j_6,2019-04-04,-12.90,,,,Maintenance,VE
66,j_9,2019-04-08,100.00,,,,IT,VE
join: input is not in sorted order
Pulling this all together into a single join
command:
$ join -t ',' -1 7 -2 1 -o 1.1,1.2,1.3,1.4,1.5,1.6,2.2,2.3 <(sort -t ',' -k7,7 file1) <(sort -t ',' -k1,1 file2)
j_2,2019-04-01,100.00,,,,Accounting,VE
j_7,2019-04-05,-9.78,,,,Accounting,VE
j_1,2019-04-01,20.00,,,,Paint,VE
j_4,2019-04-04,130.00,,,,Paint,VE
j_8,2019-04-05,-250.00,,,,Paint,VE
j_1,2019-04-01,100.00,,,,Legal,VE
j_3,2019-04-01,-264.56,,,,Legal,VE
j_5,2019-04-04,130.00,,,,Office,VE
j_6,2019-04-04,-12.90,,,,Maintenance,VE
j_9,2019-04-08,100.00,,,,IT,VE
NOTE: if you know for a fact that a file is already sorted by the join column then you can replace the associated <(sort -t ',' -kx,x file)
with file
Because of the hassles of having to explicitly list all output columns I'd probably opt for something in awk
...
One awk
idea:
awk '
BEGIN { FS=OFS="," }
FNR==NR { suffix[$1] = $2 OFS $3; next } # 1st file: store data in array suffix[]
{ if ($NF in suffix) { # 2nd file: if last column ($NF==$7 in this case) is an index in the suffix[] array ...
for (i=1;i<NF;i++) # loop through the 1st to (NF-1)th columns ...
printf "%s%s", $i, OFS # print each one to stdout then ...
print suffix[$NF] # finish off current line of output with the value stored in the array location
}
}
' file2 <(sort -t ',' -k7,7 file1) # no need to sort file2 as we will just be doing lookups against the suffix[] array;
# sort file1 so the output is generated in the same order as OP's desired result
This generates:
j_2,2019-04-01,100.00,,,,Accounting,VE
j_7,2019-04-05,-9.78,,,,Accounting,VE
j_1,2019-04-01,20.00,,,,Paint,VE
j_4,2019-04-04,130.00,,,,Paint,VE
j_8,2019-04-05,-250.00,,,,Paint,VE
j_1,2019-04-01,100.00,,,,Legal,VE
j_3,2019-04-01,-264.56,,,,Legal,VE
j_5,2019-04-04,130.00,,,,Office,VE
j_6,2019-04-04,-12.90,,,,Maintenance,VE
j_9,2019-04-08,100.00,,,,IT,VE