Wait for a child process to finish before starting new child process

I have to process ten very big files. Each file takes about two days to process by my_profiler. I can parallelize the work so that my_profiler runs on each file separately, hence using all of my system's cores. My approach at parallelizing the work is to run three processes in three different terminals at the time. I can't process more than four files at once, or my system starts getting unresponsive (hangs up).

My goal is to write a shell script which processes the ten files in batches of size three. Once processing of one file finishes, the terminal should be closed and processing of a new file should start in another terminal. As a terminal I want to use gnome-terminal.

Currently I am stuck with the following script, which runs all processes in parallel:

for j in $jobs
do
    gnome-terminal -- bash -c "my_profiler $j"
done

How I can wait until a shell script running in a instance of gnome-terminal finishes?

My first thought was that I might need to send a signal form the old terminals once their job is finished.

Solution

I am not quite sure why you have to start a new gnome-terminal for each job. But you could use xargs in combination with -P ^[1]. Running three my_profiler in parallel at the same time:

echo "${jobs}" | xargs -P3 -I{} gnome-terminal --wait -e 'bash -c "my_profiler {}"'

Important here is to start gnome-terminal with --wait otherwise the terminal demonizes itself which will have the effect that xargs starts the next process. --wait was introduced with gnome-terminal 3.27.1.

The -I{} option to xargs defines a placeholder ({}) which xargs will replace with a filename before running the command ^[2]. In the example above, xargs scans the command string (gnome-terminal --wait -e 'bash -c "my_profiler {}"') for {} and replaces the found instances with the first file coming from stdin (echo "${jobs}" | ...). The resulting string it then executes. xargs will do this three times (-P3), before it starts waiting for at least one process to finish. If this happens, xargs will start the next process.

[1]: from man xargs

-P max-procs, --max-procs=max-procs

Run up to max-procs processes at a time; the default is 1. If max-procs is 0, xargs will run as many processes as possible at a time. Use the -n option or the -L option with -P; otherwise chances are that only one exec will be done. While xargs is running, you can send its process a SIGUSR1 signal to increase the number of commands to run simultaneously, or a SIGUSR2 to decrease the number. You cannot increase it above an implementation-defined limit (which is shown with --show-limits). You cannot decrease it below 1. xargs never terminates its commands; when asked to decrease, it merely waits for more than one existing command to terminate before starting another.

Please note that it is up to the called processes to properly manage parallel access to shared resources. For example, if more than one of them tries to print to stdout, the ouptut will be produced in an indeterminate order (and very likely mixed up) unless the processes collaborate in some way to prevent this. Using some kind of locking scheme is one way to prevent such problems. In general, using a locking scheme will help ensure correct output but reduce performance. If you don't want to tolerate the performance difference, simply arrange for each process to produce a separate output file (or otherwise use separate resources).

[2]: from man xargs

-I replace-str

Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not terminate input items; instead the separator is the newline character. Implies -x and -L 1.