Search code examples
regexawkregex-negation

Print complete text block between two markers using awk, only if the block does not contain a specific keyword


I have a certain pattern in my file as so:

....
BEGIN
any text1
any text2
END
....
BEGIN
any text3
garbage text
any text4
END
....
BEGIN
any text5
any text6
END
...

BEGIN and END are my markers, and I want to extract all the text between the markers only if the block does not contain 'garbage text'. So my expectation is to extract the blow blocks:

any text1
any text2

any text5
any text6

How do I do it in awk? I know I can do something like:

awk '/BEGIN/{f=1;next}/END/{f=0;}f' file.log

to extract the lines between the two markers, but how do I further refine the results by further filtering based on absence of 'garbage text'?


Solution

  • $ awk '/END/{if (rec !~ /garbage text/) print rec} {rec=rec $0 ORS} /BEGIN/{rec=""}' file
    any text1
    any text2
    
    any text5
    any text6
    

    The above assumes every END is paired with a preceding BEGIN. WIth GNU awk for multi-char RS you could alternatively do:

    $ awk -v RS='END\n' '{sub(/.*BEGIN\n/,"")} RT!="" && !/garbage text/' file
    any text1
    any text2
    
    any text5
    any text6
    

    btw instead of:

    awk '/BEGIN/{f=1;next}/END/{f=0;}f' file.log
    

    your original code should be just:

    awk '/END/{f=0} f; /BEGIN/{f=1}' file.log
    

    See Printing with sed or awk a line following a matching pattern for related idioms.