Search code examples
bashshellscriptingquoting

BASH: Unavoidable wordsplitting in subcommand expansion?


So I'm writing a BASH shell script to perform some CLI testing for a Node project I'm working on (I didn't tag Node in this question because really this solely pertains to BASH); my the CLI testing looks like this:

test_command=$'node source/main.js --input-regex-string \'pcre/(simple)? regex/replace/vim\' -o';
echo $test_command;
$test_command 1>temp_stdout.txt 2>temp_stderr.txt;
test_code=$?;
echo "test_code $test_code"
test_stdout=`cat temp_stdout.txt`;
test_stderr=`cat temp_stderr.txt`;

As you can see, I'm using the C-style quotes $'...', as described here, which should make it so that $test_command expands literally to node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o which is what the echo on line 2 shows, however when I attempt to run the command on line 3, I'll get an error saying that regex/replace/vim' isn't a recognised command-line parametre in my script. Obviously, what's happening here is despite me seemingly quoting and escaping everything correctly, BASH is still splitting the regex/replace/vim' part into its own word. Based on everything I've read on the topic of BASH's quoting and word splitting rules, this shouldn't be happening but yet it is. I've tried changing the quoting on the first line to use strong/literal ' quotes ('node source/main.js --input-regex-string "pcre/(simple)? regex/replace/vim" -o' which just causes line 3 to treat the entire thing as one word and thus not work) and the weak/dynamic " quotes ("node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o" exact same as strong-quote example, not to mention that since the quoted string in this case is a regular expression literal, it's not a good fit for the magic expansion behaviour of " anyway) in place of the C-style quotes, changing the escaping of the command string itself to fit with whichever quote style is being used; I've tried adding additionally escaping to the string such as test_command=$'node source/main.js --input-regex-string \\\'pcre/(simple)?\ regex/replace/vim\\\' -o only to witness the exact same behaviour; and I've tried changing the way I invoke the command on line 3: quoting the expansion, encasing it in { ... } or ${ ... } with combinations of the previously mentioned variations, all of which still resulted in either original word-splitting problem or me just being given a generic "bad substitution" syntax error.

So, in short, my question is what is the correct way to invoke/format a command, stored as a string in a BASH variable, containing a quoted literal string, that BASH won't inexplicably word split the contained quoted string and break the whole command?


Solution

  • what is the correct way to invoke/format a command, stored as a string in a BASH variable, containing a quoted literal string

    You assume that the there is no difference between

    1. typing a command directly into the terminal/script
    2. storing the exact same command string into a variable and then executing $variable.

    But there are many differences! Commands typed directly into bash undergo more processing steps than anything else. These steps are documented in bash's manual:

    1. Tokenization
      Quotes are interpreted. Operators are identified. The command is split into words at whitespace between unquoted parts. IFS is not used here.
    2. Several expansions in a left-to-right fashion. That is, after one of these transformations were applied to a token, bash would continue to process its result with 3. For example, you could safely use a home directory with a literal $ in its pathname as the result of expanding ~ does not undergo variable expansion, thus the $ remains uninterpreted.
    • brace expansion {1..9}
    • tilde expansion ~
    • parameter and variable expansion $var
    • arithmetic expansion $((...))
    • command substitution $(...), `...`
    • process substitution <()
    1. Word splitting
      Split the result of unquoted expansions using IFS.
    2. Filename expansion
      Also known as globbing: *, ?, [...] and more with shopt -s extglob.
    Admittedly, this confuses most bash beginners. To me it seems, most of Stackoverflow's bash questions are about things related to these processing steps. Some classical examples are [`for i in {1..$n}` does not work][2] and [`echo $var` does not print what I assigned to `var`][3].

    Strings from unquoted variables only undergo some of the processing steps listed above. As described, these steps are "3. word splitting" and "4. filename expansion".

    If you want to apply all processing steps to a string, you can use the eval command. However, this is very frowned upon as there are either better alternatives (if you define the command yourself) or huge security implications (if an outsider defines the command).

    In your example, I don't see a reason to store the command at all. But if you really want to access it as a string somewhere else, then use an array:

    command=(node source/main.js --input-regex-string 'pcre/(simple)? regex/replace/vim' -o)
    echo "${command[*]}" # print
    "${command[@]}"      # execute