Search code examples
bashawkquoteexpansion

How character \ result in different outputs when to call bash in awk--Shell Expansions and Quote Removal?


To convert decimal 32 into it's hex value with printf.

printf '%x\n'  32
20

The right awk statement to call the above bash in awk is as below.

awk 'BEGIN{system("printf '\''%x\n'\''  32")}'
20

My question has no relationship with that.
For the following call bash in awk,where is the \?
Character \ lost,why?

~$ awk 'BEGIN{system("printf '%x\n'  32")}'
20n~$ 

\n is not parsed as newline ,nor \n literally,but n only ,why parser lost it--the \?
To make my question more clear,

awk 'BEGIN{system("printf '%xn'  32")}' 
awk 'BEGIN{system("printf '%x\n'  32")}'

Why the two awk statements output same string? We got more knowledges on shell expansions and quote removal.

GNU awk manual

After the preceding expansions, all unquoted occurrences of the characters ‘\’, ‘'’, and ‘"’ that did not result from one of the above expansions are removed.

awk 'BEGIN{system("printf '%xn' 32")}' == awk 'BEGIN{system("printf '%x\n' 32")}' can be explained.

How about the following? What if many characters \ ,more than one \

awk 'BEGIN{system("printf '%x\\n'  32")}'
0sh: 2: 32: not found

Why not

awk 'BEGIN{system("printf '%x\\n'  32")}'  ==  awk 'BEGIN{system("printf '%xn'  32")}'

And more interesting ,let's to add more \ in it.

$ awk 'BEGIN{system("printf '%x\\\n'  32")}'
0sh: 2: 32: not found

To add four and five \s in it.

$ awk 'BEGIN{system("printf '%x\\\\n'  32")}'
20n$ awk 'BEGIN{system("printf '%x\\\\\n'  32")}'
20n$ 

Most most interesting thing here:

awk 'BEGIN{system("printf '%x\\\\\\n'  32")}'
20$ 

character n lost and no newline.


Solution

  • If the command at hand is more than just an example, it is worth heeding shellter's helpful advice to use awk's built-in printf function - there's no need to call system() with the external printf utility.
    awk 'BEGIN{ printf "%x\n", 32 }' works just fine.

    Overall, you have 3 layers of quoting to deal with, in sequence:

    • First, the current shell (bash) interprets the tokens of the command - both the quoted and unquoted ones.

    • awk then sees the result of this interpretation and performs its own interpretation of the embedded double-quoted printf command string.

    • The result is passed to system(), which invokes /bin/sh, where the string is again interpreted by the shell (sh, in this case).

    Your original command:

    awk is incidental to your first command; it is the shell's (Bash's) string quoting that matters:

    • POSIX-like shells such as Bash allow string concatenation by placing any mix of unquoted, single-quoted, and double-quoted (interpolated, expanded) strings directly next to each other.

    • Single-quoted strings - '...' - do not support nesting.

    • Therefore, 'BEGIN{system("printf '%x\n' 32")}' breaks down as:

      • 'BEGIN{system("printf '%x\n' - a single-quoted shell string whose contents are used as-is.

      • %x\n, an unquoted shell string that is subject to shell expansions:

        • %x is used as-is.
        • \-prefixing a character is a character-individual form of quoting: \<char> tells the shell that <char> is to be taken literally, so \n turns into plain n. This form of quoting is only required for shell metacharacters (characters such as | that normally have special meaning), and since n is not one of them, \n and n are effectively the same, so %xn - without the \ - would result in the same literal.
      • ' 32")}', a single-quoted shell string whose contents are used as-is.

    • Thus, after expansions, which includes the removal of the quoting characters (' and \ in this case, a process known as quote removal) and concatenation, the shell ultimately passes the following literal to awk:
      BEGIN{system("printf %xn 32")}

    • As you can see, awk never sees the \.

      • Because Awk finds no escape sequences in string "printf %xn 32", it passes literal printf %xn 32 to the system() function, which in turn invokes /bin/sh with the specified string.
      • Thus, shell command printf %xn 32 is executed, which prints 20n, without a trailing newline.
        • Note that this command, due to being passed to /bin/sh, is again subject to shell expansions (in essentially the same way as in bash, except for process substitution and, potentially, Bash-specific parameter expansions), but they result in no change in this case.

    Your follow-up questions:

    Building on the explanations above:

    awk 'BEGIN{system("printf '%x\\n' 32")}' results in the following literal awk script:

    BEGIN{system("printf %x\n  32")}
    

    \\n, from the shell's perspective, is \\ - a quoted \ character - followed by n, resulting in literal \n.

    In this case, control-character escape sequence \n is interpreted by Awk and converted to an actual newline before the string is passed to /bin/sh by the system() function.

    Thus, /bin/sh sees two commands:

    printf %x
    32
    

    printf %x prints 0, because in the absence of an argument for format char. %x the value defaults to 0. 32 by itself is not a valid command, hence the error message sh: 2: 32: not found (2 is the line number).


    awk 'BEGIN{system("printf '%x\\\n' 32")}' is the same as the previous command:

    \\\n, from the shell's perspective, is \\ - a quoted \ character - followed by \n - a quoted n character - again resulting in literal \n.


    As for awk 'BEGIN{system("printf '%x\\\\n' 32")}':

    \\\\n results in literal \\n.

    awk in turn interprets that as literal \n.

    /bin/sh then again interprets that \n as an individually quoted n literal, effectively executing printf %xn 32, which yields 20n, without a trailing newline.


    As for awk 'BEGIN{system("printf '%x\\\\\\n' 32")}':

    awk sees \\\n, which it turns into literal \, followed by an actual newline. Therefore, what is ultimately passed to /bin/sh by awk looks like this:

    printf %x\
      32
    

    The above, which contains an \-escaped actual newline, is interpreted as a single command by /bin/sh (this is how line continuation works in the shell), so it is effectively the same as
    printf %x 32 and results in 20, without a trailing newline.