I'm trying to de-compose a field from a specific file into an array, and then check if each term appears in a second file (which has been already stored in another array). The goal is to merge information from both files.
The first file1
(the one with the field I want to split) looks like that:
data1=data2=data3 some more stuff
data4=data1 this are things
data2=data5 more text here
...
While file2
has this structure:
data1 10
data2 20
data3 35
data4 15
data5 60
I want to split the the first field of file1
using =
, then search each of the splitted terms in the second file, and print everything in the following format:
output
:
data1=data2=data3 some more stuff 10
data1=data2=data3 some more stuff 20
data1=data2=data3 some more stuff 35
data4=data1 this are things 15
data4=data1 this are things 10
data2=data5 more text here 20
data2=data5 more text here 60
So far, I've got this:
awk 'NR==FNR {
l[$1] = $2; next
} {
la=split($1,a,"=")
for(x=1;x<=la;x++)
print $0,l[a[$x]]
}' file2 file1 > output
First (when NR==FNR
), I store file2
data in the array l
using the first field as key.
Then I parse the next file in the following manner: for each record, I split the field $1
into an array la
using =
as the separator. la
variable stores the number of terms in the array a
.
For each element in array a
(for
loop), I look for the corresponding key in array l
and output the current content + l
value.
But, for some reason, I only get the content from file1
(current, unwanted output):
data1=data2=data3 some more stuff
data1=data2=data3 some more stuff
data1=data2=data3 some more stuff
data4=data1 this are things
data4=data1 this are things
data2=data5 more text here
data2=data5 more text here
Any ideas on what might be wrong with my code?
Thanks a lot!
I found the answer myself. It was an issue with variable naming.
This is the correct code:
awk 'NR==FNR {
l[$1] = $2; next
} {
la=split($1,a,"=")
for(x=1;x<=la;x++)
print $0,l[a[x]]
}' file2 file1 > output
The key is in the printing function. It now reads print $0,l[a[x]]
instead of print $0,l[a[$x]]
. The loop is using x
as its internal counter, not $x
. Changing that now points to the correct key in array l
(from file2
).
I'm leaving the post because it looks like this question hasn't been posed before. Please tell me if you think it's not useful.
Thanks!