I am trying to clean up an XML file using sed.
I need to remove all <DistanceMeters>123.123</DistanceMeters>
.
I've been trying to use this command, without success:
sed 's/(<DistanceMeters>)[.]*?(<\/DistanceMeters>)/ /g' file.txc
Example node:
<Trackpoint><Time>2014-02-12T18:18:49+11:00</Time>
<Position><LatitudeDegrees>35.209656</LatitudeDegrees><LongitudeDegrees>28.99924</LongitudeDegrees></Position>
<AltitudeMeters>586.99994</AltitudeMeters>
<DistanceMeters>148.30713</DistanceMeters>
<Cadence>4</Cadence>
<Extensions><TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2" CadenceSensor="Bike"><Speed>0.043145742</Speed></TPX></Extensions></Trackpoint>
To make things a little more confusing, the source file is all on a single line.
Thanks.
If DistanceMeters
is in a separated line, just do:
awk '!/DistanceMeters/' file
<Trackpoint><Time>2014-02-12T18:18:49+11:00</Time>
<Position><LatitudeDegrees>35.209656</LatitudeDegrees><LongitudeDegrees>28.99924</LongitudeDegrees></Position>
<AltitudeMeters>586.99994</AltitudeMeters>
<Cadence>4</Cadence>
<Extensions><TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2" CadenceSensor="Bike"><Speed>0.043145742</Speed></TPX></Extensions></Trackpoint>
To remove it from inside a text block, you can do:
awk '{sub(/<DistanceMeters>[^>]*>/,x)}8' file
Or with sed
:
sed 's/<DistanceMeters>[^>]*>//g' file
Both this is none greedy, so it will not destroy lines with multiple occurrence of <DistanceMeters>
blocks, as oppose to use the greedy .*