Search code examples
bashshellwhitespaceexpansionquoting

Why can't I double-quote a variable with several parameters in it?


I'm writing a bash script that uses rsync to synchronize directories. According to the Google shell style guide:

  • Always quote strings containing variables, command substitutions, spaces or shell meta characters, unless careful unquoted expansion is required.
  • Use "$@" unless you have a specific reason to use $*.

I wrote the following test case scenario:

#!/bin/bash

__test1(){
  echo stdbuf -i0 -o0 -e0 $@
  stdbuf -i0 -o0 -e0 $@
}

__test2(){
  echo stdbuf -i0 -o0 -e0 "$@"
  stdbuf -i0 -o0 -e0 "$@"
}


PARAM+=" --dry-run "
PARAM+=" mirror.leaseweb.net::archlinux/"
PARAM+=" /tmp/test"


echo "test A: ok"
__test1 nice -n 19 rsync $PARAM 

echo "test B: ok"
__test2 nice -n 19 rsync $PARAM

echo "test C: ok"
__test1 nice -n 19 rsync "$PARAM"

echo "test D: fails"
__test2 nice -n 19 rsync "$PARAM"

(I need stdbuf to immediately observe output in my longer script that i'm running)

So, my question is: why does test D fail with the below message?

rsync: getaddrinfo:  --dry-run  mirror.leaseweb.net 873: Name or service not known

The echo in every test looks the same. If I'm suppose to quote all variables, why does it fail in this specific scenario?


Solution

  • I agree with @Fred — using arrays is best. Here's a bit of explanation, and some debugging tips.

    Before running the tests, I added

    echo "$PARAM"
    set|grep '^PARAM='
    

    to actually show what PARAM is.** In your original test, it is:

    PARAM=' --dry-run  mirror.leaseweb.net::archlinux/ /tmp/test'
    

    That is, it is a single string that contains multiple space-separated pieces.

    As a rule of thumb (with exceptions!*), bash will split words unless you tell it not to. In tests A and C, the unquoted $@ in __test1 gives bash an opportunity to split $PARAM. In test B, the unquoted $PARAM in the call to __test2has the same effect. Therefore,rsync` sees each space-separated item as a separate parameter in tests A-C.

    In test D, the "$PARAM" passed to __test2 is not split when __test2 is called, because of the quotes. Therefore, __test2 sees only one parameter in $@. Then, inside __test2, the quoted "$@" keeps that parameter together, so it is not split at the spaces. As a result, rsync thinks the entirety of PARAM is the hostname, so fails.

    If you use Fred's solution, the output from sed|grep '^PARAM=' is

    PARAM=([0]="--dry-run" [1]="mirror.leaseweb.net::archlinux/" [2]="/tmp/test")
    

    That is bash's internal notation for an array: PARAM[0] is "--dry-run", etc. You can see each word individually. echo $PARAM is not very helpful for an array, since it only outputs the first word (here, --dry-run).

    Edits

    * As Fred points out, one exception is that, in the assignment A=$B, B will not be expanded. That is, A=$B and A="$B" are the same.

    ** As ghoti points out, instead of set|grep '^PARAM=', you can use declare -p PARAM. The declare builtin with the -p switch will print out a line that you could paste back into the shell to recreate the variable. In this case, that output is:

    declare -a PARAM='([0]="--dry-run" [1]="mirror.leaseweb.net::archlinux/" [2]="/tmp/test")'
    

    This is a good option. I personally prefer the set|grep approach because declare -p gives you an extra level of quoting, but both work fine. Edit As @rici points out, use declare -p if an element of your array might include a newline.

    As an example of the extra quoting, consider unset PARAM ; declare -a PARAM ; PARAM+=("Jim's") (a new array with one element). Then you get:

    set|grep:   PARAM=([0]="Jim's")
          # just an apostrophe ^
    declare -p: declare -a PARAM='([0]="Jim'\''s")'
          #    a bit uglier, in my opinion ^^^^