Search code examples
awkrhel

Different results in awk when using different FS syntax


I have a sample file which contains the following.

logging.20160309.113.txt.log:  0 Rows successfully loaded.
logging.20160309.1180.txt.log:  0 Rows successfully loaded.
logging.20160309.1199.txt.log:  0 Rows successfully loaded.

I currently am familiar with 2 ways of implementing a Field Separator syntax in awk. However, I am currently getting different results.

For the longest time I use

"FS=" syntax when my FS is more than one character.

"-f" flag when my FS is just one character.

I would like to understand why the FS= syntax is giving me an unexpected result as seen below. Somehow the 1st record is being left behind.

$ head -3 reload_list | awk -F"\.log\:" '{ print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
$ head -3 reload_list |  awk '{ FS="\.log\:" } { print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt.log:
logging.20160309.1180.txt
logging.20160309.1199.txt

Solution

  • The reason you are getting different results, is that in the case where you set FS in the awk program, it is not in a BEGIN block. So by the time you've set it, the first record has already been parsed into fields (using the default separator).

    Setting with -F

     $ awk -F"\\.log:" '{ print $1 }' b.txt
     logging.20160309.113.txt
     logging.20160309.1180.txt
     logging.20160309.1199.txt
    

    Setting FS after parsing first record

    $ awk '{ FS= "\\.log:"} { print $1 }' b.txt
    logging.20160309.113.txt.log:
    logging.20160309.1180.txt
    logging.20160309.1199.txt
    

    Setting FS before parsing any records

    $ awk 'BEGIN { FS= "\\.log:"} { print $1 }' b.txt
    logging.20160309.113.txt
    logging.20160309.1180.txt
    logging.20160309.1199.txt
    

    I noticed this relevant bit in an awk manual. If perhaps you've seen different behavior previously or with a different implementation, this could explain why:

    According to the POSIX standard, awk is supposed to behave as if each record is split into fields at the time that it is read. In particular, this means that you can change the value of FS after a record is read, but before any of the fields are referenced. The value of the fields (i.e. how they were split) should reflect the old value of FS, not the new one.

    However, many implementations of awk do not do this. Instead, they defer splitting the fields until a field reference actually happens, using the current value of FS! This behavior can be difficult to diagnose.