Search code examples
awkdelimiter

Empty first field when using multiple delimiters


I'm trying to parse the output of a program, which is given like this:

  Status       : OK (97 ms)

Those are all spaces, no tabs. I don't know if that spacing will remain consistent over different versions, so I want to treat spaces and colons as delimiters.

I'm well aware that the field separator can be declared as an arbitrarily complex regular expression, so I expect this would work:

echo "  Status       : OK (97 ms)" | awk -F'[ :]+' '/Status/{print $2}'

But it does not; instead it prints "Status", and $1 is an empty string.

Compare this with the output of the built-in delimiter, where leading delimiters seem to be ignored and $1 is "Status":

echo "  Status       : OK (97 ms)" | awk '/Status/{print $1}'

It's easy enough to print $3 instead, but it makes me wonder what I am doing wrong, or misunderstanding?

I'm using GNU Awk 3.1.7


Solution

  • Because, in the sample input, field separators precede Status, the first field is empty and the second field is Status. Observe:

    $ echo "  Status       : OK (97 ms)" | awk -F'[ :]+' '/Status/{print $2}'
    Status
    $ echo "Status       : OK (97 ms)" | awk -F'[ :]+' '/Status/{print $2}'
    OK
    

    One option is to make : or ( into field separators, In that case, the second field will contain $2 regardless of whether or not there is leading space:

    $ echo "  Status       : OK (97 ms)" | awk -F'[:(]+' '/Status/{print $2}'
     OK 
    $ echo "Status       : OK (97 ms)" | awk -F'[:(]+' '/Status/{print $2}'
     OK 
    

    Another option is to keep your field separator but eliminate the leading space before printing:

    $ echo "  Status       : OK (97 ms)" | awk -F'[ :]+' '{sub(/^ +/,"")} /Status/{print $2}'
    OK
    $ echo "Status       : OK (97 ms)" | awk -F'[ :]+' '{sub(/^ +/,"")} /Status/{print $2}'
    OK
    

    Awk and leading or trailing field separators

    For the default field separator, leading and trailing blanks are ignored. If one uses a custom field separator, leading and trailing separator characters not ignored. This is documented in the POSIX standard:

    1. If FS is a null string, the behavior is unspecified.

    2. If FS is a single character:

      a. If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.

      b. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c.

    3. Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.