Search code examples
loopsparallel-processingcshgnu-parallel

Fake parallelization in script over loop (foreach line) without substantial changes in code


I am new to GNU Parallel and I will be glad if you point out some errors and misunderstandings. I read the manual but it says basically about one-stage operation in which it is necessary to specify the definition of "action" in the syntax GNU Parallel (unpacking, moving and etc) and nothing is specified about the multi-stage steps when you need to perform a few actions without changing (significantly) the code (if the course is at all possible)

Is it possible to "fake" parallel processing in the code that does not support it? The code has a loop (there are included list of files in any format, and at some point it comes to loop) and all you need that code to perform certain actions (no matter what kind of actions) on all files simultaneously rather than sequentially (without changing the code substantially or only around 138 line - see below). It's that kind of parallel processing is not required to split files or something like that, but just to processing all files at once.

As example: here is a part of code that interests, full code here - 138 line GMT

# <code> actions (see full code - link below) and check input file availability
#loop
#
  foreach line (`awk '{print $0}' $1`)
# <code> actions (see full code - link below)
end if

Source, full code: GMT

Maybe it can be implemented using other tools besides the GNU Parallel? Any help is useful. It is desirable for example if any. And if you make all of the code parallel, it probably will cause problems. It's necessary at the moment of the loop.

Thanks


Solution

  • csh has many limitations; lack of functions is one of them, and any script that's longer than a few lines will quickly turn into a spaghetti mess. This is an important reason why scripting in csh is typically discouraged.

    That being said, the easiest way to modify this is to extract the loop body out to a separate script and call that with & appended. For example:

    main.csh:

    #!/bin/csh
    
    foreach line (`awk '{print $0}' $1`)
        ./loop.csh "$line" &
    end
    

    loop.csh:

    #!/bin/csh
    
    set line = "$1"
    echo "=> $line"
    sleep 5
    

    You may need to add more parameters than just $line; I didn't check the entire script.

    The & will make the shell continue without waiting for the command to finish. So if there are 5,000 lines you will be running 5,000 processes at the same time. To exercise some control over the number of simultaneous processes you could use the parallel tool instead of a loop:

    #!/bin/csh
    
    awk '{print $0}' $1 | parallel ./loop.csh`
    

    Or if you want to stick with loops you can use pgrep to limit the maximum number of simultaneous processes:

    foreach line (a b c d e f g h i)
        set numprocs = `pgrep -c loop.csh`
        if ( $numprocs > 2 ) then
            sleep 2
            continue
        endif
    
        ./loop.csh "$line" &
    end