I was reading the definition of the PROCINFO
built-in variable on GNU Awk User's Guide → 7.5.2 Built-in Variables That Convey Information:
PROCINFO #
The elements of this array provide access to information about the running awk program. The following elements (listed alphabetically) are guaranteed to be available:
PROCINFO["FS"]
This is
"FS"
if field splitting withFS
is in effect,"FIELDWIDTHS"
if field splitting withFIELDWIDTHS
is in effect,"FPAT"
if field matching withFPAT
is in effect, or"API"
if field splitting is controlled by an API input parser.
And yes, it works very well. See this example when I provide the string "hello;you" and I set, by order, FS
to ";", FIELDWIDTHS
to "2 2 " and FPAT
to three characters:
$ gawk 'BEGIN{FS=";"}{print PROCINFO["FS"]; print $1}' <<< "hello;you"
FS
hello
$ gawk 'BEGIN{FIELDWIDTHS="2 2 2"}{print PROCINFO["FS"]; print $1}' <<< "hello;you"
FIELDWIDTHS
he
$ gawk 'BEGIN{FPAT="..."}{print PROCINFO["FS"]; print $1}' <<< "hello;you"
FPAT
hel
This is fine and works very well.
The, a bit before they mention in 4.8 Checking How gawk Is Splitting Records:
In order to tell which kind of field splitting is in effect, use
PROCINFO["FS"]
(see section Built-in Variables That Convey Information). The value is"FS"
if regular field splitting is being used,"FIELDWIDTHS"
if fixed-width field splitting is being used, or"FPAT"
if content-based field splitting is being used.
And also in Changing FS Does Not Affect the Fields they describe how the changes affect the next record:
According to the POSIX standard, awk is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of
FS
after a record is read, the values of the fields (i.e., how they were split) should reflect the old value ofFS
, not the new one.
This case explains it very well:
$ gawk 'BEGIN{FS=";"} {FS="|"; print $1}' <<< "hello;you
bye|everyone"
hello # "hello;you" is splitted using FS=";", the assignment FS="|" doesn't affect it yet
bye # "bye|everyone" is splitted using FS="|"
Having all of this into consideration, I would assume that PROCINFO["FS"]
would always reflect the "FS"
as the field splitting in the record it is being printed on.
However, see this case:
$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print $1}' <<< "hello;you"
FS
hel
PROCINFO["FS"]
shows the info set in the current record (FS), not the one that Awk is taking into account when processing the data (that is, FPAT). The same occurs if we swap the assignments:
$ gawk 'BEGIN{FS=";"}{FPAT="..."; print PROCINFO["FS"]; print $1}' <<< "hello;you"
FPAT
hello
Why is PROCINFO["FS"]
showing a different FS than the one that is being used in the record it is printed in?
Field splitting (using FS, FIELDWIDTHS, or FPAT) occurs when a record is read or $0
as a whole is given a new value otherwise (e.g. $0="foo"
or sub(/foo/,"bar")
). print PROCINFO["FS"]
tells you the value that PROCINFO["FS"]
currently has which is not necessarily the same value it had when field splitting last occurred.
With:
$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print $1}' <<< "hello;you"
FS
hel
You're setting FS=";"
after $1
has already been populated based on FPAT="..."
, then printing PROCINFO["FS"]
new value (which will be used the next time a record is split into fields), then printing the value of $1
which was populated before you set FS=";"
.
If you set $0
to itself the field splitting will occur again, this time using the new FS value rather than the original FPAT value:
$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print $1; $0=$0; print $1}' <<< "hello;you"
FS
hel
hello