I'd like to remove some elements in a big xml file, if a value is missing.
I have found a topic where it says how to extract the elements where the value is present but not the other way around. Solution could be sed or xmlstartlet but I can't figure it out.
xmlstarlet ed -d '//eslXmlDto[.//itemAssociations]' < file1.xml >> file2.xml
Here is the file I have
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<screens>
<screenXmlDto>
<articleCodeType>EAN</articleCodeType>
<creationDate>2017-04-25T12:23:18.746+02:00</creationDate>
<domain>toto.tata</domain>
<screenCode>16201000032884264000</screenCode>
<itemAssociations>
<itemCode>2118550000000</itemCode>
<position>1</position>
</itemAssociations>
</screenXmlDto>
<screenXmlDto>
<articleCodeType>EAN</articleCodeType>
<creationDate>2016-07-27T03:59:17.328+02:00</creationDate>
<domain>toto.tata</domain>
<screenCode>17201000030538183370</screenCode>
</screenXmlDto>
<screenXmlDto>
<articleCodeType>EAN</articleCodeType>
<creationDate>2016-07-26T12:28:20.815+02:00</creationDate>
<domain>toto.tata</domain>
<screenCode>17201000030538091000</screenCode>
<itemAssociations>
<itemCode>4008033444958</itemCode>
<position>1</position>
</itemAssociations>
</screenXmlDto>
</screens>
Here is the output I want
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<screens>
<screenXmlDto>
<articleCodeType>EAN</articleCodeType>
<creationDate>2017-04-25T12:23:18.746+02:00</creationDate>
<domain>toto.tata</domain>
<screenCode>16201000032884264000</screenCode>
<itemAssociations>
<itemCode>2118550000000</itemCode>
<position>1</position>
</itemAssociations>
</screenXmlDto>
<screenXmlDto>
<articleCodeType>EAN</articleCodeType>
<creationDate>2016-07-26T12:28:20.815+02:00</creationDate>
<domain>toto.tata</domain>
<screenCode>17201000030538091000</screenCode>
<itemAssociations>
<itemCode>4008033444958</itemCode>
<position>1</position>
</itemAssociations>
</screenXmlDto>
</screens>
xmlstarlet
solution:
xmlstarlet ed -d '//screenXmlDto[not(itemAssociations)]' file1.xml
-d
- delete action//screenXmlDto[not(itemAssociations)]
- xpath expression to select all screenXmlDto
nodes which don't have itemAssociations
node as a childThe output:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<screens>
<screenXmlDto>
<articleCodeType>EAN</articleCodeType>
<creationDate>2017-04-25T12:23:18.746+02:00</creationDate>
<domain>toto.tata</domain>
<screenCode>16201000032884264000</screenCode>
<itemAssociations>
<itemCode>2118550000000</itemCode>
<position>1</position>
</itemAssociations>
</screenXmlDto>
<screenXmlDto>
<articleCodeType>EAN</articleCodeType>
<creationDate>2016-07-26T12:28:20.815+02:00</creationDate>
<domain>toto.tata</domain>
<screenCode>17201000030538091000</screenCode>
<itemAssociations>
<itemCode>4008033444958</itemCode>
<position>1</position>
</itemAssociations>
</screenXmlDto>
</screens>