I have a sample file which contains the following.
logging.20160309.113.txt.log: 0 Rows successfully loaded.
logging.20160309.1180.txt.log: 0 Rows successfully loaded.
logging.20160309.1199.txt.log: 0 Rows successfully loaded.
I currently am familiar with 2 ways of implementing a Field Separator syntax in awk. However, I am currently getting different results.
For the longest time I use
"FS=" syntax when my FS is more than one character.
"-f" flag when my FS is just one character.
I would like to understand why the FS= syntax is giving me an unexpected result as seen below. Somehow the 1st record is being left behind.
$ head -3 reload_list | awk -F"\.log\:" '{ print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
$ head -3 reload_list | awk '{ FS="\.log\:" } { print $1 }'
awk: warning: escape sequence `\.' treated as plain `.'
awk: warning: escape sequence `\:' treated as plain `:'
logging.20160309.113.txt.log:
logging.20160309.1180.txt
logging.20160309.1199.txt
The reason you are getting different results, is that in the case where you set FS
in the awk program, it is not in a BEGIN
block. So by the time you've set it, the first record has already been parsed into fields (using the default separator).
Setting with -F
$ awk -F"\\.log:" '{ print $1 }' b.txt
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
Setting FS
after parsing first record
$ awk '{ FS= "\\.log:"} { print $1 }' b.txt
logging.20160309.113.txt.log:
logging.20160309.1180.txt
logging.20160309.1199.txt
Setting FS
before parsing any records
$ awk 'BEGIN { FS= "\\.log:"} { print $1 }' b.txt
logging.20160309.113.txt
logging.20160309.1180.txt
logging.20160309.1199.txt
I noticed this relevant bit in an awk manual. If perhaps you've seen different behavior previously or with a different implementation, this could explain why:
According to the POSIX standard,
awk
is supposed to behave as if each record is split into fields at the time that it is read. In particular, this means that you can change the value ofFS
after a record is read, but before any of the fields are referenced. The value of the fields (i.e. how they were split) should reflect the old value ofFS
, not the new one.However, many implementations of
awk
do not do this. Instead, they defer splitting the fields until a field reference actually happens, using the current value ofFS
! This behavior can be difficult to diagnose.