bash sed while-loop parallel-processing gnu-parallel

GNU Parallel produces different output compared to while loop with this sed command

I'm confused about how GNU Parallel is handling piped input fed into a sed inplace file edit, and I'd like understand what it's doing (and also so that I can get it to work!).

I have two files, f1 and f2 that look as follows:

a11    a12    a13
a21    a22    a23
...
an1    an2    an3

a41
stuff
...
a91
stuff
...

and what I'm trying to do is concatenate elements in the second and third columns from f1 to each corresponding element (of the first column) present in f2, such that f2 looks like:

a41 a42 a43
stuff
...
a91 a92 a93
things
...

A simple while loop does the job:

while IFS=$'\t' read -r e1 e2 e3; do sed -i "s/$e1/& $e2 $e3/g" f2 ; done < f1

And I tried to replicate this using GNU Parallel like so:

cat f1 | parallel --colsep '\t' -q sed -i "s/{1}/& {2} {3}/g" f2

which modifies only a fraction of the entries in f2 compared to the while loop. It would look something like this:

a41 a42 a43
stuff
...
a91
things
...
a71 a72 a73
words
...

So, any ideas about what's happening, and how I can replicate the while loop behaviour using GNU Parallel?

Thanks!

Solution

It is due to sed not replacing in place. What it does is to create a new file which is then moved to the original file.

So what you are seeing is multiple seds in parallel each creating a new file. When one of these finishes, it will overwrite the original, but this will no be seen by the other seds currently running that will still be working on the original file.

So if you use -j1 you will not see this problem. But you will also not see a speed up.

I am not sure GNU Parallel can help you here. A solution is to convert f2 into a big sed script.