Search code examples
xmlshellsedxmlstarlet

How to delete a matching block once a pattern is matched


Here is the file (named it as sample.xml):


<?xml version="1.0" encoding="UTF-8"?>
<configs>

    <blah1 value="ma">
      <tag3>100MB</tag3>
    </blah1>

    <blah1 value="ba">
      <tag3>20MB</tag3>
    </blah1>

     <blah2 value="*" version="1.0" result="true">
        <blah1 value="xyz">
          <blah1 value="uvw" result="true">
             <tag>4</tag>
          </blah1>
        </blah1>
     </blah2>

  <!-- This is tag with def value -->
  <blah2 value="*" version="2.0" result="true">
    <blah1 value="abc">
      <blah1 value="def" result="true">
        <tag2>on</tag2>
      </blah1>
    </blah1>
  </blah2>

</configs>

On finding a string with value="def", remove the entire block beginning from <blah2> to </blah2> tags

Am not familiar with sed hold pattern but something I got from google which is very close

sed -n '/<blah2.*>/,/<\/blah2>/{
                                  H
                                  /<\/blah2>/ { 
                                        s/.*//;x
                                       /def/d
                                       p 
                                  }
                               }' sample.xml

Expected result:


<?xml version="1.0" encoding="UTF-8"?>
<configs>

    <blah1 value="ma">
      <tag3>100MB</tag3>
    </blah1>

    <blah1 value="ba">
      <tag3>20MB</tag3>
    </blah1>

     <blah2 value="*" version="1.0" result="true">
        <blah1 value="xyz">
          <blah1 value="uvw" result="true">
             <tag>4</tag>
          </blah1>
        </blah1>
     </blah2>

</configs>

Actual result (with above non-working sed):

     <blah2 value="*" version="1.0" result="true">
        <blah1 value="xyz">
          <blah1 value="uvw" result="true">
             <tag>4</tag>
          </blah1>
        </blah1>
     </blah2>

Solution

  • This might work for you (GNU sed):

    sed '/<blah2.*>/{:a;N;/<\/blah2.*>/!ba;/value="def"/d}' file
    

    If a line contains <blah2.*> gather up all lines until a line containing <\/blah2.*>, then test those lines for the string value="def" and if found, delete those lines.