Search code examples
bashawkcygwin

how can i make awk process the BEGIN block for each file it parses?


i have an awk script that i'm running against a pair of files. i'm calling it like this:

awk -f script.awk file1 file2

script.awk looks something like this:

BEGIN {FS=":"}
{ if( NR == 1 )
    { 
      var=$2
      FS=" "
    }
   else print var,"|",$0
}

the first line of each file is colon-delimited. for every other line, i want it to return to the default whitespace file seperator.

this works fine for the first file, but fails because FS is not reset to : after each file, because the BEGIN block is only processed once.

tldr: is there a way to make awk process the BEGIN block once for each file i pass it?

i'm running this on cygwin bash, in case that matters.


Solution

  • If you're using gawk version 4 or later there's the BEGINFILE block. From the manual:

    BEGINFILE and ENDFILE are additional special patterns whose bodies are executed before reading the first record of each command line input file and after reading the last record of each file. Inside the BEGINFILE rule, the value of ERRNO will be the empty string if the file could be opened successfully. Otherwise, there is some problem with the file and the code should use nextfile to skip it. If that is not done, gawk produces its usual fatal error for files that cannot be opened.

    For example:

    touch a b c
    awk 'BEGINFILE { print "Processing: " FILENAME }' a b c
    

    Output:

    Processing: a
    Processing: b
    Processing: c
    

    Edit - a more portable way

    As noted by DennisWilliamson you can achieve a similar effect with FNR == 1 at the beginning of your script. In addition to this you could change FS from the command-line directly, e.g.:

    awk -f script.awk FS=':' file1 FS=' ' file2
    

    Here the FS variable will retain whatever value it had previously.