Search code examples
awksedpattern-matchingblockmarkers

Deleting the block between two regex markers when a pattern is matched inside the block


Let's suppose the following structure:

  -   key1: value11
      key2:
      - value21
      - value22
      - value23
      key3: value31
      key4:
      - value41
      - value42
      key5: value51
  -   key1: value12
      key2:
      - value24
      - value25
      key3: value32
      key5: value52
  -   key1: value13
      key2:
      - value26
      key3: value33
      key4:
      - value43
      - value44
      - value45
      key5: value53

Is it possible to remove all the blocks between (and including) the begin and end marker regexes:

 - begin marker: '^[[:blank:]]{2}-[[:blank:]]{3}key1:[[:blank:]].+$'
 - end marker:   '^[[:blank:]]{6}key5:[[:blank:]].+$'

when the following regex is matched inside the block(s):

matching pattern: '^[[:blank:]]{6}key3:[[:blank:]]value32$'?

The goal is to obtain:

  -   key1: value11
      key2:
      - value21
      - value22
      - value23
      key3: value31
      key4:
      - value41
      - value42
      key5: value51
  -   key1: value13
      key2:
      - value26
      key3: value33
      key4:
      - value43
      - value44
      - value45
      key5: value53

The begin marker could also serve as an end marker is the second marker occurrence is not deleted during the block removal(s).

I have unsuccessfully tried multiple approaches with sed/awk, such as this one inspired from 4.21 paragraph at this post:

sed ':t
/^[[:blank:]]{2}-[[:blank:]]{3}key1:[[:blank:]].+$/,/^[[:blank:]]{6}key5:[[:blank:]].+$/ {      # For each line between these block markers
        /^[[:blank:]]{6}key5:[[:blank:]].+$/!{                                                  # If we are not at the /end/ marker
                $!{                                                                             # nor the last line of the file
                        N;                                                                      # add the Next line to the pattern space
                        bt
                }                                                                               # and branch (loop back) to the :t label
        }                                                                                       # This line matches the /end/ marker
        /^[[:blank:]]{6}key3:[[:blank:]]value32$/d;                                             # If /regex/ matches, delete the block
}' file

Solution

  • The file format looks like a YAML. Then why don't you use yq to filter it? Then you can just say:

    yq -y '[ .[] | select (.key3 != "value32") ]' file
    

    which results:

    - key1: value11
      key2:
      - value21
      - value22
      - value23
      key3: value31
      key4:
      - value41
      - value42
      key5: value51
    - key1: value13
      key2:
      - value26
      key3: value33
      key4:
      - value43
      - value44
      - value45
      key5: value53
    

    You may need to install yq with pip install yq or something similar.