Search code examples
bashescapingquotes

Bash escaping issue with $@


I've written a script to simplify running a long launch command:

# in ~/.bash_profile
function runProgram() { sbt "run-main com.longpackagename.mainclass $@ arg3"; };
export -f runProgram;

However, it fails when I try to pass multiple arguments:

$ runProgram arg1 arg2
...
[info] Running com.longpackagename.mainclass arg1

What happened to arg2 and arg3? Were they eaten by bash or by sbt?

The script works as expected if I run it like this:

$ runProgram "arg1 arg2"

--

Additionally: this type of issue happens all the time for me. I would also appreciate a reference on how to escape properly in bash. The first & second resources that I tried didn't address this situation.


Solution

  • The best reference for bash, including how quoting works, is the bash manual itself, which is almost certainly installed on your machine where you can read it without an internet connection by typing man bash. It's a lot to read, but there's no real substitute.

    Nonetheless, I will try to explain this particular issue. There are two important things to know: first, how (and when) bash splits a command line into separate "words" (or command line arguments); second, what $@ and $* mean. These are not entirely unrelated.

    Word-splitting is partially controlled by the special parameter IFS, but I just mention that; I'm assuming it hasn't been altered. For more details, see man bash.

    Below, I call quoting a string with double-quotes ("...") weak quoting, and quoting with apostrophes ('...') strong quoting. The backslash (\) is also a form of strong quoting.

    Word-splitting happens:

    1. after parameters (shell variables) have been substituted with their values,

    2. wherever there is a sequence of whitespace characters,

    3. except if the whitespace is quoted in any way, (" ", ' ', \ are three ways),

    4. before quotes are removed.

    Once a command has been split into words, the first word is used to find the program or function to invoke, and the remaining words become the program's arguments. (I'm ignoring lots of stuff like shell metacharacters, redirections, pipes, etc., etc. For more details, see man bash.)

    Parameters are substituted with their values (step 1) if their name is preceded by a $ unless the $name is strongly quoted (that is, '$name' or, for example, \$name). There's lots more forms of parameter substitution. For more details, see man bash.

    Now, $@ and $* both mean "all of the positional parameters to the current command/function", and if they are used without quotes, they do precisely the same thing. They are replaced by all of the positional parameters, with a single space between each parameter. Since this is a type of parameter substitution (as above), word-splitting happens after the substitution except if the substitution is in quotes, as in the above list.

    If the substitution is in quotes, then according to the above rules, the whitespace which was inserted between the parameters is not subject to word-splitting. And that's precisely how $* works. $* is replaced by the space-separated command-line parameters and the result is word-split; "$*" is replaced by the space-separated command-line parameters as a single word.

    "$@" is an exception. And, in fact, this is why $@ exists at all. If the $@ is inside weak quotes ("$@"), then the quotes are removed, and each positional parameter is individually quoted. These quoted positional parameters are then spaced-separated and substituted for the $@. Since the $@ is no longer quoted itself, the inserted spaces do cause word-splitting. The final result is that the individual parameters are retained as individual words.

    In case that was not totally clear, here's an example. printf has the virtue of repeating the provided format until it runs out of parameters, which makes it easy to see what's going on.

    showargs() { 
      echo -n '$*:   '; printf "<%s> " $*; echo
      echo -n '"$*": '; printf "<%s> " "$*"; echo
      echo -n '"$@": '; printf "<%s> " "$@"; echo
    }
    
    showargs one two three
    showargs "one two" three
    

    (Try to figure out what that prints before you execute it.)

    It's often said that you almost always want "$@" and almost never "$@" or $*. That's generally true, but it's also the case that you almost never want "something with $@ inside of it". To understand that, you need to know what "something with $@ inside of it" does. It's a bit wierd, but it shouldn't be unexpected. We'll take the invocation of sbt from the OP as an example:

    sbt "run-main com.longpackagename.mainclass $@ arg3"
    

    with two positional parameters supplied to the function, so that $1 is arg1 and $2 is arg2.

    First, bash removes the quotes around $@. However, it can't just remove them altogether, since there is also quoted text there. So it has to close off the quoted text and reopen the quotes afterwards, producing:

    sbt "run-main com.longpackagename.mainclass "$@" arg3"
    

    Now, it can substitute in the quoted, spaced-separated arguments:

    sbt "run-main com.longpackagename.mainclass ""arg1" "arg2"" arg3"
    

    This is now word-split:

    sbt
    "run-main com.longpackagename.mainclass ""arg1"
    "arg2"" arg3"
    

    and the quotes are removed:

    sbt
    run-main com.longpackagename.mainclass arg1
    arg2 arg3
    

    sbt is expecting only one positional parameter. You gave it two, and it ignored the second one.

    Now, suppose the function were called with a single argument, "arg1 arg2". In that case, the substitution of $@ results in:

    sbt "run-main com.longpackagename.mainclass ""arg1 arg2"" arg3"
    

    and word-splitting produces

    sbt
    "run-main com.longpackagename.mainclass ""arg1 arg2"" arg3"
    

    without quotes:

    sbt
    run-main com.longpackagename.mainclass arg1 arg2 arg3"
    

    and there is only one positional parameter for sbt.