Search code examples
bashsedxmlstarlet

(bash) Delete elements from xml if a value is missing


I'd like to remove some elements in a big xml file, if a value is missing.

I have found a topic where it says how to extract the elements where the value is present but not the other way around. Solution could be sed or xmlstartlet but I can't figure it out.

 xmlstarlet ed -d '//eslXmlDto[.//itemAssociations]' < file1.xml >> file2.xml

Here is the file I have

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<screens>
    <screenXmlDto>
        <articleCodeType>EAN</articleCodeType>
        <creationDate>2017-04-25T12:23:18.746+02:00</creationDate>
        <domain>toto.tata</domain>
        <screenCode>16201000032884264000</screenCode>
        <itemAssociations>
            <itemCode>2118550000000</itemCode>
            <position>1</position>
        </itemAssociations>
    </screenXmlDto>
    <screenXmlDto>
        <articleCodeType>EAN</articleCodeType>
        <creationDate>2016-07-27T03:59:17.328+02:00</creationDate>
        <domain>toto.tata</domain>
        <screenCode>17201000030538183370</screenCode>
    </screenXmlDto>
    <screenXmlDto>
        <articleCodeType>EAN</articleCodeType>
        <creationDate>2016-07-26T12:28:20.815+02:00</creationDate>
        <domain>toto.tata</domain>
        <screenCode>17201000030538091000</screenCode>
        <itemAssociations>
            <itemCode>4008033444958</itemCode>
            <position>1</position>
        </itemAssociations>
    </screenXmlDto>
</screens>

Here is the output I want

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<screens>
    <screenXmlDto>
        <articleCodeType>EAN</articleCodeType>
        <creationDate>2017-04-25T12:23:18.746+02:00</creationDate>
        <domain>toto.tata</domain>
        <screenCode>16201000032884264000</screenCode>
        <itemAssociations>
            <itemCode>2118550000000</itemCode>
            <position>1</position>
        </itemAssociations>
    </screenXmlDto>
    <screenXmlDto>
        <articleCodeType>EAN</articleCodeType>
        <creationDate>2016-07-26T12:28:20.815+02:00</creationDate>
        <domain>toto.tata</domain>
        <screenCode>17201000030538091000</screenCode>
        <itemAssociations>
            <itemCode>4008033444958</itemCode>
            <position>1</position>
        </itemAssociations>
    </screenXmlDto>
</screens>

Solution

  • xmlstarlet solution:

    xmlstarlet ed -d '//screenXmlDto[not(itemAssociations)]' file1.xml
    
    • -d - delete action
    • //screenXmlDto[not(itemAssociations)] - xpath expression to select all screenXmlDto nodes which don't have itemAssociations node as a child

    The output:

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <screens>
      <screenXmlDto>
        <articleCodeType>EAN</articleCodeType>
        <creationDate>2017-04-25T12:23:18.746+02:00</creationDate>
        <domain>toto.tata</domain>
        <screenCode>16201000032884264000</screenCode>
        <itemAssociations>
          <itemCode>2118550000000</itemCode>
          <position>1</position>
        </itemAssociations>
      </screenXmlDto>
      <screenXmlDto>
        <articleCodeType>EAN</articleCodeType>
        <creationDate>2016-07-26T12:28:20.815+02:00</creationDate>
        <domain>toto.tata</domain>
        <screenCode>17201000030538091000</screenCode>
        <itemAssociations>
          <itemCode>4008033444958</itemCode>
          <position>1</position>
        </itemAssociations>
      </screenXmlDto>
    </screens>