Search code examples
linuxawksedgrepless-unix

Using grep or other command to return the line number of a multiline pattern


I was using the less command to browse a very huge text log file (15 GB) and was trying to search for a multiline pattern but after some investigation, less command can only search single line patterns.

Is there a way to use grep or other commands to return the number line of a multiline pattern?

The format of the log is something like this in iterations of hundred thousands:

Packet A
op_3b       : 001
ctrl_2b     : 01
ini_count   : 5

Packet F
op_3b       : 101
ctrl_2b     : 00
ini_count   : 4

Packet X
op_3b       : 010
ctrl_2b     : 11
ini_count   : 98

Packet CA
op_3b       : 100
ctrl_2b     : 01
ini_count   : 5

Packet LP
op_3b       : 001
ctrl_2b     : 00
ini_count   : 0

Packet ZZ
op_3b       : 111
ctrl_2b     : 01
ini_count   : 545

Packet QEA
op_3b       : 111
ctrl_2b     : 11
ini_count   : 0

And what I am trying to get is to have grep or some other command to return the start of the line number of when these three line pattern occurs:

op_3b       : 001
ctrl_2b     : 00
ini_count   : 0

Solution

  • Suppose that pattern is in file pattern like this:

    $ cat pattern
    op_3b       : 001
    ctrl_2b     : 00
    ini_count   : 0
    

    Then, try:

    $ awk '$0 ~ pat' RS=  pat="$(cat pattern)" logfile
    Packet LP
    op_3b       : 001
    ctrl_2b     : 00
    ini_count   : 0
    

    How it works

    • RS=

      This sets the Record Separator RS to an empty string. This tells awk to use an empty line as the record separator.

    • pat="$(cat pattern)"

      This tells awk to create an awk variable pat which contains the contents of the file pattern.

      If your shell is bash, then a slightly more efficient form of this command would be pat="$(<pattern)". (Don't use this unless you are sure that your shell is bash.)

    • $0 ~ pat

      This tells awk to print any record that matches the pattern.

      $0 is the contents of the current record. ~ tells awk to do a match between the text in $0 and the regular expression in pat.

      (If the contents of pattern had any regex active characters, we would need to escape them. Your current example does not have any so this is not a problem.)

    Alternative style

    Some people prefer a different style for defining awk variables:

    $ awk -v RS=  -v pat="$(cat pattern)" '$0 ~ pat' logfile
    Packet LP
    op_3b       : 001
    ctrl_2b     : 00
    ini_count   : 0
    

    This works the same.

    Displaying line numbers

    $ awk -F'\n' '$0 ~ pat{print "Line Number="n+1; print "Packet" $0} {n=n+NF-1}' RS='Packet'  pat="$(cat pattern)" logfile
    Line Number=20
    Packet LP
    op_3b       : 001
    ctrl_2b     : 00
    ini_count   : 0