Search code examples
bashawksedgrep

Select blocks of lines between $blockBEGIN and $blockEND containing specific pattern


I'm trying to select blocks containing a specific regexp pattern inspired from this solution :

$ blockBEGIN='ID'
$ blockEND='Sector Size'
$ myPATTERN='Ready'
$ cat pdisks-simplified-20230825.log | sed -n "/$blockBEGIN/,/$blockEND/{/$blockEND/"'s/$/\x00/;p}' | grep -z "$myPATTERN" | grep -z -v "$blockEND" | tr -d '\x00'
$

But nothing shows up.

Sample input:

ID                              : 0:1:4
Status                          : Ok
State                           : Ready
Power Status                    : Spun Up
Bus Protocol                    : SAS
Media                           : HDD
Capacity                        : 3,725.50 GB (4000225165312 bytes)
Vendor ID                       : DELL(tm)
Product ID                      : ST4000NM0023
Serial No.                      : Z1Z6AAR9
Part Number                     : TH0529FG212334AI01AGA02
Sector Size                     : 512B
ID                              : 0:1:0
Status                          : Ok
State                           : Online
Power Status                    : Not Applicable
Bus Protocol                    : SATA
Media                           : SSD
Capacity                        : 372.00 GB (399431958528 bytes)
Vendor ID                       : DELL(tm)
Product ID                      : INTEL SSDSC2BX400G4R
Serial No.                      : BTHC721403F8400VGN
Part Number                     : CN065WJJIT200766014OA00
Sector Size                     : 512B

Here is a matching block from the pdisks-simplified-20230825.log file that looks like this :

ID                              : 0:1:4
Status                          : Ok
State                           : Ready
Power Status                    : Spun Up
Bus Protocol                    : SAS
Media                           : HDD
Capacity                        : 3,725.50 GB (4000225165312 bytes)
Vendor ID                       : DELL(tm)
Product ID                      : ST4000NM0023
Serial No.                      : Z1Z6AAR9
Part Number                     : TH0529FG212334AI01AGA02
Sector Size                     : 512B
$

And here is a non matching block from the pdisks-simplified-20230825.log file that looks like this :

ID                              : 0:1:0
Status                          : Ok
State                           : Online
Power Status                    : Not Applicable
Bus Protocol                    : SATA
Media                           : SSD
Capacity                        : 372.00 GB (399431958528 bytes)
Vendor ID                       : DELL(tm)
Product ID                      : INTEL SSDSC2BX400G4R
Serial No.                      : BTHC721403F8400VGN
Part Number                     : CN065WJJIT200766014OA00
Sector Size                     : 512B
$

How can I do that ?


Solution

  • Assumptions:

    • blocks are separated by a blank line (as in OP's original question)
    • all blocks start with ID and end with Sector Size (otherwise we'll need to add some more logic)

    If awk is an acceptable solution:

    awk -v myptn="${myPATTERN}" 'BEGIN {RS=""} $0 ~ myptn' block.log
    

    Where:

    • -v myptn="${myPATTERN}" - populate the awk variable named myptn with the bash/OS variable's value
    • RS="" - define record separator as blank line
    • $0 ~ myptn - if the record contains the string/pattern contained in the awk variable named myptn then print the record [NOTE: this will match on any string within the block so if OP needs to be more specific then we'll need to expand the code]
    • block.log contains both sample blocks provided by OP

    When myPATTERN="Ready" this generates:

    ID                              : 0:1:4
    Status                          : Ok
    State                           : Ready
    Power Status                    : Spun Up
    Bus Protocol                    : SAS
    Media                           : HDD
    Capacity                        : 3,725.50 GB (4000225165312 bytes)
    Vendor ID                       : DELL(tm)
    Product ID                      : ST4000NM0023
    Serial No.                      : Z1Z6AAR9
    Part Number                     : TH0529FG212334AI01AGA02
    Sector Size                     : 512B
    

    Another approach based on strings that delimit the start/end of a block:

    awk -v bstart="${blockBEGIN}" -v bend="${blockEND}" -v myptn="${myPATTERN}" '
    
    index($0,bstart)==1 { inblock = 1 }                              # if line starts with "bstart" then enable flag
    index($0,bend)==1   { if (block ~ myptn)                         # if line starts with "bend" then ...
                             print block                             # print current block and ...
                          block=""                                   # reset our
                          inblock=0                                  # varables
                        }
    
    inblock             { block = block (block ? ORS : "" ) $0 }     # if flag is set then append current line to variable "block"
    ' block.log
    

    NOTES:

    • assumes bstart and bend start in the 1st character of the line otherwise replace the index() calls with an appropriate to match bstart and bend
    • assumes bstart and bend are unique enough to only match one row in a block

    This generates:

    ID                              : 0:1:4
    Status                          : Ok
    State                           : Ready
    Power Status                    : Spun Up
    Bus Protocol                    : SAS
    Media                           : HDD
    Capacity                        : 3,725.50 GB (4000225165312 bytes)
    Vendor ID                       : DELL(tm)
    Product ID                      : ST4000NM0023
    Serial No.                      : Z1Z6AAR9
    Part Number                     : TH0529FG212334AI01AGA02