Search code examples
xmlregexsed

Using sed to delete node and data from XML file


I am trying to clean up an XML file using sed.

I need to remove all <DistanceMeters>123.123</DistanceMeters>.

I've been trying to use this command, without success:

sed 's/(<DistanceMeters>)[.]*?(<\/DistanceMeters>)/ /g' file.txc

Example node:

<Trackpoint><Time>2014-02-12T18:18:49+11:00</Time>
<Position><LatitudeDegrees>35.209656</LatitudeDegrees><LongitudeDegrees>28.99924</LongitudeDegrees></Position>
<AltitudeMeters>586.99994</AltitudeMeters>
<DistanceMeters>148.30713</DistanceMeters>
<Cadence>4</Cadence>
<Extensions><TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2" CadenceSensor="Bike"><Speed>0.043145742</Speed></TPX></Extensions></Trackpoint>

To make things a little more confusing, the source file is all on a single line.

Thanks.


Solution

  • If DistanceMeters is in a separated line, just do:

    awk '!/DistanceMeters/' file
    <Trackpoint><Time>2014-02-12T18:18:49+11:00</Time>
    <Position><LatitudeDegrees>35.209656</LatitudeDegrees><LongitudeDegrees>28.99924</LongitudeDegrees></Position>
    <AltitudeMeters>586.99994</AltitudeMeters>
    <Cadence>4</Cadence>
    <Extensions><TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2" CadenceSensor="Bike"><Speed>0.043145742</Speed></TPX></Extensions></Trackpoint>
    

    To remove it from inside a text block, you can do:

    awk '{sub(/<DistanceMeters>[^>]*>/,x)}8' file
    

    Or with sed:

    sed 's/<DistanceMeters>[^>]*>//g' file
    

    Both this is none greedy, so it will not destroy lines with multiple occurrence of <DistanceMeters> blocks, as oppose to use the greedy .*