Search code examples
javaterminalparallel-processingbigdatagnu-parallel

Running GNU Parallel Java Job


Is this the correct way to execute a java job with the input myFile.txt? What I want to do is to run the MyJavaClass program with the input given into args[0], however, I want to run this locally on my machine on multiple cores rather than on a cluster.

parallel java MyJavaClass ::: myFile.txt

EDIT:

What I want to accomplish is the following:

java MyJavaClass arg1 arg2 arg3 
java MyJavaClass arg4 arg5 arg6
java MyJavaClass arg7 arg8 arg9  

and I would like these jobs to run in parallel


Solution

  • If you have myFile.txt with millions of lines, and you want this split into one chunk per CPU core, and then run MyJavaClass on that input, and we assume that MyJavaClass reads from stdin (standard input) and prints to stdout (standard output) so the 3 lines would look something like this:

    cat chunk1 | java MyJavaClass > output1
    cat chunk2 | java MyJavaClass > output2
    cat chunk3 | java MyJavaClass > output3
    

    then it looks like this using GNU Parallel:

    parallel -a myFile.txt --pipepart --block -1 java MyJavaClass > combined_output
    

    If MyJavaClass instead takes a filename so the 3 lines look like this:

    java MyJavaClass chunk1 > output1
    java MyJavaClass chunk2 > output2
    java MyJavaClass chunk3 > output3
    

    then this may work:

    # --fifo is fast, but may not work if MyJavaClass seeks into the file
    parallel -a myFile.txt --pipepart --fifo --block -1 java MyJavaClass {} > combined_output
    # --cat creates temporary files
    parallel -a myFile.txt --pipepart --cat --block -1 java MyJavaClass {} > combined_output
    

    If MyJavaClass outputs to a filename, so the 3 lines look like this

    java MyJavaClass chunk1 --output-file chunk1.output
    java MyJavaClass chunk2 --output-file chunk2.output
    java MyJavaClass chunk3 --output-file chunk3.output
    

    you can then use that {#} is the job number and thus is unique:

    parallel [...] java MyJavaClass {} --output-file {#}.output