I'm writing a bash script that uses rsync
to synchronize directories. According to the Google shell style guide:
- Always quote strings containing variables, command substitutions, spaces or shell meta characters, unless careful unquoted expansion is required.
- Use
"$@"
unless you have a specific reason to use$*
.
I wrote the following test case scenario:
#!/bin/bash
__test1(){
echo stdbuf -i0 -o0 -e0 $@
stdbuf -i0 -o0 -e0 $@
}
__test2(){
echo stdbuf -i0 -o0 -e0 "$@"
stdbuf -i0 -o0 -e0 "$@"
}
PARAM+=" --dry-run "
PARAM+=" mirror.leaseweb.net::archlinux/"
PARAM+=" /tmp/test"
echo "test A: ok"
__test1 nice -n 19 rsync $PARAM
echo "test B: ok"
__test2 nice -n 19 rsync $PARAM
echo "test C: ok"
__test1 nice -n 19 rsync "$PARAM"
echo "test D: fails"
__test2 nice -n 19 rsync "$PARAM"
(I need stdbuf
to immediately observe output in my longer script that i'm running)
So, my question is: why does test D fail with the below message?
rsync: getaddrinfo: --dry-run mirror.leaseweb.net 873: Name or service not known
The echo
in every test looks the same. If I'm suppose to quote all variables, why does it fail in this specific scenario?
I agree with @Fred — using arrays is best. Here's a bit of explanation, and some debugging tips.
Before running the tests, I added
echo "$PARAM"
set|grep '^PARAM='
to actually show what PARAM
is.**
In your original test, it is:
PARAM=' --dry-run mirror.leaseweb.net::archlinux/ /tmp/test'
That is, it is a single string that contains multiple space-separated pieces.
As a rule of thumb (with exceptions!*
), bash will split words unless you tell it not to. In tests A and C, the unquoted $@
in __test1
gives bash an opportunity to split $PARAM
. In test B, the unquoted $PARAM
in the call to __test2has the same effect. Therefore,
rsync` sees each space-separated item as a separate parameter in tests A-C.
In test D, the "$PARAM"
passed to __test2
is not split when __test2
is called, because of the quotes. Therefore, __test2
sees only one parameter in $@
. Then, inside __test2
, the quoted "$@"
keeps that parameter together, so it is not split at the spaces. As a result, rsync
thinks the entirety of PARAM
is the hostname, so fails.
If you use Fred's solution, the output from sed|grep '^PARAM='
is
PARAM=([0]="--dry-run" [1]="mirror.leaseweb.net::archlinux/" [2]="/tmp/test")
That is bash's internal notation for an array: PARAM[0]
is "--dry-run"
, etc. You can see each word individually. echo $PARAM
is not very helpful for an array, since it only outputs the first word (here, --dry-run
).
*
As Fred points out, one exception is that, in the assignment A=$B
, B
will not be expanded. That is, A=$B
and A="$B"
are the same.
**
As ghoti points out, instead of set|grep '^PARAM='
, you can use declare -p PARAM
. The declare builtin with the -p
switch will print out a line that you could paste back into the shell to recreate the variable. In this case, that output is:
declare -a PARAM='([0]="--dry-run" [1]="mirror.leaseweb.net::archlinux/" [2]="/tmp/test")'
This is a good option. I personally prefer the set|grep
approach because declare -p
gives you an extra level of quoting, but both work fine. Edit As @rici points out, use declare -p
if an element of your array might include a newline.
As an example of the extra quoting, consider unset PARAM ; declare -a PARAM ; PARAM+=("Jim's")
(a new array with one element). Then you get:
set|grep: PARAM=([0]="Jim's")
# just an apostrophe ^
declare -p: declare -a PARAM='([0]="Jim'\''s")'
# a bit uglier, in my opinion ^^^^