Search code examples
pythonshellgnu-parallelsubsampling

GNU Parallel - Multiple arguments


Using GNU parallel, I am trying to run a sub-sampling script that inputs two files and outputs a specific subsampled file. I am using this command:

parallel -j+0 --eta python sub_sample_.2.py ::: file1 file2 ::: file3 file4 ::: file5 file6

But there's no ETA on the command line, i.e.:

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left 8 AVG:0.00s local:8/0/1005/0.0

Also only the first four files are processed, but not the last two: file5 and file6.


Solution

  • parallel -j+0 --eta python sub_sample_.2.py ::: file1 file2 ::: file3 file4 ::: file5 file6
    

    2*2*2 = 8 jobs in total.

    Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
    ETA: 0s Left 8 AVG:0.00s local:8/0/1005/0.0
    

    The ETA is computed on the runtime of jobs that finished. Here no jobs have finished yet, so there is no ETA. You can also see all 8 jobs are running on your local system, so you likely have 8 or more cores.

    Also only the first four files are processed, but not the last two: file5 and file6.

    Written this way I suspect you might not be aware of what multiple ::: do. Run --dryrun and see if that is what you expect will be run.

    My guess is that what you really want to run is (requires version 20160422 or later):

    parallel --eta python sub_sample_.2.py ::: file1 file3 file5 :::+ file2 file4 file6
    

    Or:

    parallel --xapply --eta python sub_sample_.2.py ::: file1 file3 file5 ::: file2 file4 file6