Search code examples
bashxargsgnu-parallel

calling shell function using parallel with list of quoted filenames as input


Using Bash.

I have an exported shell function which I want to apply to many files.

Normally I would use xargs, but the syntax like this (see here) is too ugly for use.

...... | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$@"' _ {}

In that discussion, parallel had an easier syntax:

..... | parallel -P 10 echo_var {}

Now I have run into the following problem: the list of files to which I want to apply my function is a list of files on one line, each quoted and separated by spaces thus: "file 1" "file 2" "file 3".

how can I feed this space-separated, quoted, list into parallel?

I can replicate the list using echo for testing.

e.g.

echo '"file 1" "file 2" "file 3"'|parallel -d " " my_function {}

but I can't get this to work.

How can I fix it?


Solution

  • How can I fix it?

    You have to choose a unique separator.

    echo 'file 1|file 2|file 3' | xargs -d "|" -n1 bash -c 'my_function "$@"' --
    echo 'file 1^file 2^file 3' | parallel -d "^" my_function
    

    The safest is to use zero byte as the separator:

    echo -e 'file 1\x00file 2\x00file 3' | xargs -0 ' -n1 bash -c 'my_function "$@"' --
    printf "%s\0" 'file 1' 'file 2' 'file 3' | parallel -0 my_function
    

    The best is to store your elements inside a bash array and use a zero separated stream to process them:

    files=("file 1" "file 2" "file 3")
    printf "%s\0" "${files[@]}" | xargs -0 -n1 bash -c 'my_function "$@"' --
    printf "%s\0" "${files[@]}" | parallel -0 my_function
    

    Note that empty arrays will run the function without any arguments. It's sometimes preferred to use -r --no-run-if-empty option not to run the function when input is empty. The --no-run-if-empty is supported by parallel and is a gnu extension in xargs (xargs on BSD and on OSX do not have --no-run-if-empty).

    Note: xargs by default parses ', " and \. This is why the following is possible and will work:

    echo '"file 1" "file 2" "file 3"' | xargs -n1 bash -c 'my_function "$@"' --
    echo "'file 1' 'file 2' 'file 3'" | xargs -n1 bash -c 'my_function "$@"' --
    echo 'file\ 1 file\ 2 file\ 3' | xargs -n1 bash -c 'my_function "$@"' --
    

    And it can result in some strange things, so remember to almost always specify -d option to xargs:

    $ # note \x replaced by single x
    $ echo '\\a\b\c' | xargs
    \abc
    $ # quotes are parsed and need to match
    $ echo 'abc"def' | xargs
    xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option
    $ echo "abc'def" | xargs
    xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
    

    xargs is a portable tool available quite everywhere, while parallel is a GNU program, which has to be installed separately.