I am trying to understand how awk
deals with variables in if
statements.
Here is a toy text file:
$ cat myscript.sh
#! /bin/bash
set -eu
set -o pipefail
IFS=$'\n\t'
for arg in $@; do
echo "do something with file $arg"
done
Now I want awk
to print the longest line in the file.
I thought of doing that:
$ awk '{max = 0}{if (length($0) > max) {max = length($0)} else {}} END {print max}' myscript.sh
But this print the length of the last line. However, when I run the following:
awk '{if (length($0) > max) {max = length($0)} else {}}END{print max}' myscript.sh
The outcome is correct and it prints the right length 35
.
I cannot really understand why when I specify the max
variable before the if
statement, the condition is not recognized.
I am sure there is an easy explanation for the awk-gurus
, but I personally cannot see it.
Thank you
You could change the first command a little bit to make it work :
awk 'BEGIN{max = 0}{if (length($0) > max) {max = length($0)} else {}} END {print max}' myscript.sh
This way, you initialize the variable max at the beginning of the script. Without the BEGIN statement, max is being updated to 0 at every row.
However, awk variables have default values that depend on the context. You can read this to understand the logic behind.
Variables in awk can be assigned either numeric or string values. The kind of value a variable holds can change over the life of a program. By default, variables are initialized to the empty string, which is zero if converted to a number.
With this command :
awk '{if (length($0) > max) {max = length($0)} else {}}END{print max}' myscript.sh
Awk will initialize max to 0 at the first row because you are comparing it to length($0) which is an integer.