I'm confused about how GNU Parallel is handling piped input fed into a sed inplace file edit, and I'd like understand what it's doing (and also so that I can get it to work!).
I have two files, f1 and f2 that look as follows:
f1
a11 a12 a13
a21 a22 a23
...
an1 an2 an3
f2
a41
stuff
...
a91
stuff
...
and what I'm trying to do is concatenate elements in the second and third columns from f1 to each corresponding element (of the first column) present in f2, such that f2 looks like:
a41 a42 a43
stuff
...
a91 a92 a93
things
...
A simple while loop does the job:
while IFS=$'\t' read -r e1 e2 e3; do sed -i "s/$e1/& $e2 $e3/g" f2 ; done < f1
And I tried to replicate this using GNU Parallel like so:
cat f1 | parallel --colsep '\t' -q sed -i "s/{1}/& {2} {3}/g" f2
which modifies only a fraction of the entries in f2 compared to the while loop. It would look something like this:
a41 a42 a43
stuff
...
a91
things
...
a71 a72 a73
words
...
So, any ideas about what's happening, and how I can replicate the while loop behaviour using GNU Parallel?
Thanks!
It is due to sed not replacing in place. What it does is to create a new file which is then moved to the original file.
So what you are seeing is multiple sed
s in parallel each creating a new file. When one of these finishes, it will overwrite the original, but this will no be seen by the other sed
s currently running that will still be working on the original file.
So if you use -j1
you will not see this problem. But you will also not see a speed up.
I am not sure GNU Parallel can help you here. A solution is to convert f2 into a big sed script.