Search code examples
bashgnu-parallel

parallel processing with arguments from file


I have a 10-line file apps.txt containing information (app id, api key and secret key) of 10 applications. Fields in each line of the file are arguments to a program interacting with a server. Another file data.txt containing data as input to the program. I want the program start one instance for each line in apps.txt and split data.txt to number of applications to process. How to use GNU Parallel to do this? I tried below command but can't get the desired behavior:

cat data.txt | parallel [-N1] -j10 --pipe --no-run-if-empty --line-buffer ./program.py {1} {2} {3} :::: apps.txt

apps.txt
AppID1 API_Key1 Secret_Key1
AppID2 API_Key2 Secret_Key2
...
AppID10 API_Key10 Secret_Key10


Solution

  • I interpret your question in the way that you have 10 workers and you want to distribute blocks of stdin to those.

    Use GNU Parallel's slot replacement string and have an array of which the index is the information of the worker. Bash's arrays count index from 0, so subtract 1 from slot().

    # Set each entry in array 'worker' to one line from apps
    parset worker echo :::: apps.txt
    doit() {
      workerid="$1"
      echo "do stuff on ${worker[$workerid]}"
      # Read stuff from stdin and do 'wc' on that
      wc
    }
    # env_parallel is needed to get $worker exported
    # -j10 must be the number of lines in apps.txt
    cat data.txt | env_parallel -j10 --pipe doit '{= $_=slot()-1 =}'