Search code examples
awkgawk

Why does awk "not in" array work just like awk "in" array?


Here's an awk script that attempts to set difference of two files based on their first column:

BEGIN{
    OFS=FS="\t"
    file = ARGV[1]
    while (getline < file)
        Contained[$1] = $1
    delete ARGV[1]
    }
$1 not in Contained{
    print $0
}

Here is TestFileA:

cat
dog
frog

Here is TestFileB:

ee
cat
dog
frog

However, when I run the following command:

gawk -f Diff.awk TestFileA TestFileB

I get the output just as if the script had contained "in":

cat
dog
frog

While I am uncertain about whether "not in" is correct syntax for my intent, I'm very curious about why it behaves exactly the same way as when I wrote "in".


Solution

  • I cannot find any doc about element not in array.

    Try !(element in array).


    I guess: awk sees not as an uninitialized variable, so not is evaluated as an empty string.

    $1 not == $1 "" == $1