bashshellsedgrep

Find and return blocks of lines containing a string


I have a big file of the following type:

key = asbh
some
lines
of
**text**

key = kafeia
some
more
**text**
and
additionally
more
**text**
and
more

key = lklfh
this
is
another
block

Note (if important): the line of 'key' never contains the string of interest ('text').

I call a block all the lines between one line starting with "key" and the next such line (so in this example, 3 blocks). I would like to return all blocks containing the string 'text'. i.e. desired output:

key = asbh
some
lines
of
**text**

key = kafeia
some
more
**text**
and
additionally
more
**text**
and
more

I tried multiple things and I hope I am in the right direction, but can't seem to get it working. These are my attempts:

  1. less myfile.txt | sed -n '/key/,/text/p' | less

    I believe this may start with the first time it sees 'key' and just keeps going (so returns a lot of irrelevant blocks) until it sees 'text'somewhere and stops. This is inspired by a similar question here but that does not have the condition of pulling multiple blocks, nor of matching pattern inside blocks.

  2. less myfile.txt | grep -Pzl '(?s)^key([^key]|\n)*text' | less

    I thought this may be better and if I could get it to work, I could probably extend it as it currently attempts to only get the text between key and text (and not until the next key).

  3. I tried understanding how if statements work, particularly in view of this thread, but I am a novice in unix, so if someone could explain, I would be very grateful.


Solution

  • This might work for you (GNU sed):

    sed -n '/^key/!{H;$!d};x;/text/p' file
    

    Turn off implicit printing -n.

    If a line does not begin key, append it to the hold space and delete unless it is the last line.

    Otherwise, swap to the hold space and if the collection matches text, print it.

    N.B. The end-of-file condition naturally drops through to the matching condition. The hold/pattern space flip-flops as and when the key matches.