Search code examples
bashif-statementawklogic

awk: dealing with variable in if statement


I am trying to understand how awk deals with variables in if statements.

Here is a toy text file:

$ cat myscript.sh 
#! /bin/bash

set -eu
set -o pipefail

IFS=$'\n\t'

for arg in $@; do
    echo "do something with file $arg"
done

Now I want awk to print the longest line in the file. I thought of doing that:

$  awk '{max = 0}{if (length($0) > max) {max = length($0)} else {}} END {print max}' myscript.sh 

But this print the length of the last line. However, when I run the following:

awk '{if (length($0) > max) {max = length($0)} else {}}END{print max}' myscript.sh 

The outcome is correct and it prints the right length 35.

I cannot really understand why when I specify the max variable before the if statement, the condition is not recognized. I am sure there is an easy explanation for the awk-gurus, but I personally cannot see it.

Thank you


Solution

  • You could change the first command a little bit to make it work :

    awk 'BEGIN{max = 0}{if (length($0) > max) {max = length($0)} else {}} END {print max}' myscript.sh 
    

    This way, you initialize the variable max at the beginning of the script. Without the BEGIN statement, max is being updated to 0 at every row.

    However, awk variables have default values that depend on the context. You can read this to understand the logic behind.

    Variables in awk can be assigned either numeric or string values. The kind of value a variable holds can change over the life of a program. By default, variables are initialized to the empty string, which is zero if converted to a number.

    With this command :

    awk '{if (length($0) > max) {max = length($0)} else {}}END{print max}' myscript.sh 
    

    Awk will initialize max to 0 at the first row because you are comparing it to length($0) which is an integer.