Is this the correct way to execute a java job with the input myFile.txt
? What I want to do is to run the MyJavaClass program with the input given into args[0], however, I want to run this locally on my machine on multiple cores rather than on a cluster.
parallel java MyJavaClass ::: myFile.txt
EDIT:
What I want to accomplish is the following:
java MyJavaClass arg1 arg2 arg3
java MyJavaClass arg4 arg5 arg6
java MyJavaClass arg7 arg8 arg9
and I would like these jobs to run in parallel
If you have myFile.txt
with millions of lines, and you want this split into one chunk per CPU core, and then run MyJavaClass
on that input, and we assume that MyJavaClass
reads from stdin (standard input) and prints to stdout (standard output) so the 3 lines would look something like this:
cat chunk1 | java MyJavaClass > output1
cat chunk2 | java MyJavaClass > output2
cat chunk3 | java MyJavaClass > output3
then it looks like this using GNU Parallel:
parallel -a myFile.txt --pipepart --block -1 java MyJavaClass > combined_output
If MyJavaClass
instead takes a filename so the 3 lines look like this:
java MyJavaClass chunk1 > output1
java MyJavaClass chunk2 > output2
java MyJavaClass chunk3 > output3
then this may work:
# --fifo is fast, but may not work if MyJavaClass seeks into the file
parallel -a myFile.txt --pipepart --fifo --block -1 java MyJavaClass {} > combined_output
# --cat creates temporary files
parallel -a myFile.txt --pipepart --cat --block -1 java MyJavaClass {} > combined_output
If MyJavaClass
outputs to a filename, so the 3 lines look like this
java MyJavaClass chunk1 --output-file chunk1.output
java MyJavaClass chunk2 --output-file chunk2.output
java MyJavaClass chunk3 --output-file chunk3.output
you can then use that {#}
is the job number and thus is unique:
parallel [...] java MyJavaClass {} --output-file {#}.output