Find multiple strings in a file scattered over multiple lines using awk

I am fairly new to awk, and was looking for a "oneliner" in awk that could help me find files that contain all three strings that are in a file, the strings will be on different lines in the file(s). I was able to gather from different sites this awk command, which looks for the strings "dna-advant", "vty 5 15" and "vty 16 31". Only if all three strings are found, I wanted it to print only the filename of the file, and that works. But, although it works, I do not understand how it works.

What is the use of FNR ==1? And I do not understand the last part either

... { r=1; print FILENAME; nextfile } END { exit 1-r }'

Can someone explain it to me? Is there a possible shorter way of doing the same? :-)

gawk 'FNR == 1 { s1 = s2 = s3 = 0 }
    /dna-advant/ { s1 = 1 }
    /vty 16 31/ { s2 = 1 } 
    /vty 5 15/ { s3 =1 }  
    s1 && s2 && s3 { r=1; print FILENAME; nextfile }
    END { exit 1-r }' k*

I tried using grep but I cannot use the grep -P on my system, since I am trying to get deeper into how awk functions, I really wanted it to work with this command, hopefully someone here can explain how this command works and possibly come up with a shorter version of it!

Solution

The FNR==1 means that the following piece of code gets executed on the first line of each file that is opened. That code just resets all the s1/s2/s3 flags as each file is opened.

FNR is described here.

The r thing is purely for setting the exit status. It sets 0 (success) if one or more files are found, or 1 (failure) if no files are found. This allows it to be used like this:

if gawk ... k* ; then ...

Or like this:

gawk ...
status=$?
...
if [ $status -eq 1 ] ; then
   ...
fi

If you don't need/use that construct/functionality, you can omit all the stuff pertaining to r and the entire END block, but it doesn't cost you much performance and could be useful at some point.

The code is already pretty efficient - it stops as soon as possible, reads as little as possible and is easy to read.

You could probably build an alternative version with grep along these lines, but I wouldn't bother (and didn't):

grep -l 'dna-advant' k* | xargs grep -l 'vty 16 31' | xargs grep -l 'vty 5 15'

and optimize by searching for least likely string first.