Search code examples
gnu-parallel

GNU parallel: deleting line from joblog breaks parallel updating it


If you run GNU parallel with --joblog path/to/logfile and then delete a line from said logfile while parallel is running, GNU parallel is no longer able to append future completed jobs to it.

Execute this MWE:

#!/usr/bin/bash

parallel -j1 -n0 --joblog log sleep 1 ::: $(seq 10) &

sleep 5 && sed -i '$ d' log

If you tail -f log prior to execution, you can see that parallel keeps writing to this file. However, if you cat log after 10 seconds, you will see that nothing was written to the actual file now on disk after the third entry or so.

What's the reason behind this? Is there a way to delete something from the file and have GNU parallel be able to still write to it?

Some background as to why this happened:

Using GNU parallel, I started a few jobs on remote machines with --sshloginfile. I then needed to pkill a few jobs on one of the machines because a colleague needed to use it (and I subsequently removed the machine from the sshloginfile so that parallel wouldn't reuse it for new runs). If you pkill those processes started on the remote machine, they get an Exitval of 0 (it looks like they finished without issues; you can't tell that they were killed). I wanted to remove them immediately from the joblog so that when I restart parallel --resume later, parallel can have a look at the joblog and determine what's missing.

Turns out, this was a bad idea, as now my joblog is useless.


Solution

  • While @MarkSetchell is absolutely right in his comment, root problem here is due to man sed lying:

    -i[SUFFIX], --in-place[=SUFFIX]
              edit files in place (makes backup if SUFFIX supplied)
    

    sed -i does not edit files in place.

    What it does is to make a temporary file in the same dir, copy the input file to the temporary file while doing the editing, and finally renaming the temporary file to the input file's name. Similar to this:

    sed '$ d' log > sedXxO11P
    mv sedXxO11P log
    

    It is clear that the original log and sedXxO11P have different inodes - let us call them ino1 and ino2. GNU Parallel has ino1 open and really does not know about the existence of ino2. GNU Parallel will happily append to ino1 completely unaware that when it closes the file, the file will vanish because it has already been unlinked.

    So you need to change the content of the file without changing the inode:

    #!/usr/bin/bash                                                                               
    
    seq 10 | parallel -j1 -n0 --joblog log sleep 1 &
    
    sleep 5
    
    # Obvious race condition here:                                                                
    # Anything appended to log before sed is done is lost.                                        
    # This can be avoided by suspending parallel while running this
    tmp=$RANDOM$$
    cp log $tmp
    (rm $tmp; sed '$ d' >log) < $tmp
    
    wait
    cat log
    

    This works right now. But do not expect this to be a supported feature - ever.