Search code examples
bashgnu-parallel

GNU Parallel and Bash functions: How to run the simple example from the manual


I'm trying to learn GNU Parallel because I have a case where I think I could easily parallelize a bash function. So in trying to learn, I went to the GNU Parallel manual where there is an example...but I can't even get it working! To wit:

(232) $ bash --version
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
(233) $ cat tpar.bash
#!/bin/bash

echo `which parallel`
doit() {
  echo Doing it for $1
  sleep 2
  echo Done with $1
}
export -f doit
parallel doit ::: 1 2 3
doubleit() {
  echo Doing it for $1 $2
  sleep 2
  echo Done with $1 $2
}
export -f doubleit
parallel doubleit ::: 1 2 3 ::: a b

(234) $ bash tpar.bash
/home/mathomp4/bin/parallel
doit: Command not found.
doit: Command not found.
doit: Command not found.
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.

As you can see, I can't even get the simple example to run. Thus, I'm probably doing something amazingly stupid and basic...but I'm at a loss.

ETA: As suggested by commenters (chmod +x, set -vx):

(27) $ ./tpar.bash

echo `which parallel`
which parallel
++ which parallel
+ echo /home/mathomp4/bin/parallel
/home/mathomp4/bin/parallel

doit() {
  echo Doing it for $1
  sleep 2
  echo Done with $1
}
export -f doit
+ export -f doit
parallel doit ::: 1 2 3
+ parallel doit ::: 1 2 3
doit: Command not found.
doit: Command not found.
doit: Command not found.
doubleit() {
  echo Doing it for $1 $2
  sleep 2
  echo Done with $1 $2
}
export -f doubleit
+ export -f doubleit
parallel doubleit ::: 1 2 3 ::: a b
+ parallel doubleit ::: 1 2 3 ::: a b
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.
doubleit: Command not found.

ETA2: Note, I can, in the script, just call 'doit 1', say, and it will do that. So the function is valid, it just isn't...exported?


Solution

  • You cannot call a shell function from outside the shell where it was defined. A shell function is a concept inside the shell. The parallel command itself has no way to access it.

    Calling export -f doit in bash exports the function via the environment so that it is picked up by child processes. But only bash understands bash functions. A (grand)*child bash process can call it, but not other programs, for example not other shells.

    Going by the message “Command not found”, it appears that your preferred shell is (t)csh. You need to tell parallel to invoke bash instead. parallel invokes the shell indicated by the SHELL environment variable¹, so set it to point to bash.

    export SHELL=$(type -p bash)
    doit () { … }
    export -f doit
    parallel doit ::: 1 2 3
    

    If you only want to set SHELL for the execution of the parallel command and not for the rest of the script:

    doit () { … }
    export -f doit
    SHELL=$(type -p bash) parallel doit ::: 1 2 3
    

    I'm not sure how to deal with remote jobs, you may need to pass --env=SHELL in addition to --env=doit (note that this assumes that the path to bash is the same everywhere).

    And yes, this oddity should be mentioned more prominently in the manual. There's a brief note in the description of the command argument, but it isn't very explicit (it should explain that the command words are concatenated with a space as a separator and then passed to $SHELL -c), and SHELL isn't even listed in the environment variables section. (I encourage you to report this as a bug; I'm not doing it because I hardly ever use this program.)

    ¹ which is bad design, since SHELL is supposed to indicate a user interface preference for an interactive command line shell, and not to change the behavior of programs.