Search code examples
bashawkposix

Should awk expand escape sequences in command-line assigned variables?


I've recently discovered that Awk's -v VAR=VAL syntax for initializing variables on the command line expands escape sequences in VAL. I previously thought that it was a good way to pass strings into Awk without needing to run an escaping function over them first.

For example, the following script:

awk -v VAR='x\tx' 'BEGIN{printf("%s\n", VAR);}'

I would expect to print

x\tx

but actually prints:

x       x

An aside: environment variables to pass strings in unmodified instead, this question isn't asking how to get the behaviour I previously expected.

Here's what the man page has to say on the matter:

-v var=val, --assign var=val Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the BEGIN block of an AWK program.

And further down:

String Constants String constants in AWK are sequences of characters enclosed between double quotes (like "value"). Within strings, certain escape sequences are recognized, as in C. These are:

... list of escape seqeuences ...

The escape sequences may also be used inside constant regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters).

In compatibility mode, the characters represented by octal and hexadecimal escape sequences are treated literally when used in regular expression constants. Thus, /a\52b/ is equivalent to /a*b/.

The way I read this, val in -v var=val is not a string constant, and there is no text to indicate that the string constant escaping rules apply.

My questions:

  1. Is there a more authoritative source for the awk language than the man page, and if so what does it specify?
  2. What does POSIX have to say about this, if anything?
  3. Do all versions of Awk behave this way, i.e. can I rely on the expansion being done if I actually want it?

Solution

  • The assignment is a string constant.

    The relevant sections from the standard are:

    -v assignment The application shall ensure that the assignment argument is in the same form as an assignment operand. The specified variable assignment shall occur prior to executing the awk program, including the actions associated with BEGIN patterns (if any). Multiple occurrences of this option can be specified.

    and

    An operand that begins with an <underscore> or alphabetic character from the portable character set (see the table in XBD Portable Character Set ), followed by a sequence of underscores, digits, and alphabetics from the portable character set, followed by the = character, shall specify a variable assignment rather than a pathname. The characters before the = represent the name of an awk variable; if that name is an awk reserved word (see Grammar ) the behavior is undefined. The characters following the <equals-sign> shall be interpreted as if they appeared in the awk program preceded and followed by a double-quote (") character, as a STRING token (see Grammar ), except that if the last character is an unescaped <backslash>, it shall be interpreted as a literal <backslash> rather than as the first character of the sequence \".