Search code examples
bashparsinglexerexpander

Which runs first in bash? lexer, or expander?


I am trying to understand bash's parser and lexer mechanism. (My ultimate goal is implementing a bash-like shell).

The first case

$ test='o a'
$ ech$test
a

(^ edit: I removed double quotes for second line. My actual test case was that.)

The expander expanded the command and found the argument after expanding.
Expanded full command: echo a

So, I can assume that the lexer runs after the expansion operation because the bash understood that "echo a" is not a command name. "echo" is the command name, and "a" is an argument. (btw zsh don't that.)

The second case

$ test="'"
$ echo $test
'

Echo prints only one single quote. However, if we expand this string to: echo ', it is not a valid command because it has an unclosed quote. So, I can assume two things:

  1. At first, the lexer understands what it is and expands after.

  2. Actually, the value of the 'test' variable is not one single quote. Its value is exactly: "'". So, in reality, we don't expand to echo ' we expand to echo "'", which is valid.

But the first assumption and the first test case's assumption do not coincide. So, I assume the second one.

The third case

$ test="'"
$ echo "$test"
'

Echo prints only one single quote again. However, (I assume) it expanded this string to: echo ""'"", which is invalid because we have an unclosed quote.

So, my question is: "How does the bash understand what I mean?"


Solution

  • The POSIX specification includes a detailed description of the behavior of the shell command language, including considerable detail about how it processes input. It begins with:

    The shell shall read its input in terms of lines. (For details about how the shell reads its input, see the description of sh.) The input lines can be of unlimited length. These lines shall be parsed using two major modes: ordinary token recognition and processing of here-documents.

    It continues from there with details of how lines are tokenized.

    Only after a line has been tokenized can any kind of substitution or expansion be performed, because only at that point can the shell recognize where substitutions and expansions are called for.

    The first case

    $ test='o a'
    $ ech"$test"
    a
    

    I don't believe you. I do not reproduce that result in Bash 4.3, nor do I expect to do. The POSIX specifications and the Bash manual are explicit and in agreement on this point: parameter expansions that occur inside double quotes are not subject to word splitting.

    With respect to the order of command-line processing, what happens is that ech"$test" is recognized as a single token. Parameter expansion, field splitting, and quote removal apply to that token ("word" in shell jargon), with the overall result that it expands to a single word echo a, which, by virtue of its position in the fully-expanded command line, is interpreted as a command name.

    It would be different if you instead did

    $ ech$test
    

    where the $test parameter expansion was not quoted. I suspect that's what you actually did in Bash to get the output a. In this case, the expansion of $test is not protected from word splitting, so after expanding the word to the (single) word echo a, that is split into two words at the space. The result is echo as command name and a as its argument.

    The second case

    $ test="'"
    $ echo $test
    '
    

    Yes, as the spec describes, quote characters are recognized during token recognition. Quote characters introduced into a command by parameter expansion are significant only as themselves.

    1. Actually, the value of the 'test' variable is not one single quote.

    Yes, it is. And this can be tested in a variety of ways, such as (your third case) echo "$test", or echo ${#test}, or (in bash) echo "${test:0:1}".

    Its value is exactly: "'". So, in reality, we don't expand to echo ' we expand to echo "'", which is valid.

    No, it isn't. See above for the actual explanation of the behavior you observed.

    Overall, I strongly recommend relying first and foremost on the specifications for the behavior you want to implement. Experimenting is a fine way to try to clarify and solidify your interpretation of the specs, but it is a very unreliable way to determine the details of the required behavior.