Search code examples
pythonxmlautomationgiskml

Removing Elements from a KML (Python)


I generated a KML file using Python's SimpleKML library and the following script, the output of which is also shown below:

import simplekml
kml = simplekml.Kml()
ground = kml.newgroundoverlay(name='Aerial Extent')
ground.icon.href = 'C:\\Users\\mdl518\\Desktop\\aerial_image.png'
ground.latlonbox.north = 46.55537
ground.latlonbox.south = 46.53134
ground.latlonbox.east = 48.60005
ground.latlonbox.west = 48.57678
ground.latlonbox.rotation = 0.090320 
kml.save(".//aerial_extent.kml")

The output KML:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
    <Document id="1">
        <GroundOverlay id="2">
            <name>Aerial Extent</name>
            <Icon id="3">
                <href>C:\\Users\\mdl518\\Desktop\\aerial_image.png</href>
            </Icon>
            <LatLonBox>
                <north>46.55537</north>
                <south>46.53134</south>
                <east>48.60005</east>
                <west>48.57678</west>
                <rotation>0.090320</rotation>
        </LatLonBox>
    </GroundOverlay>
</Document>

However, I am trying to remove the "Document" tag from this KML since it is a default element generated with SimpleKML, while keeping the child elements (e.g. GroundOverlay). Additionally, is there a way to remove the "id" attributes associated with specific elements (i.e. for the GroundOverlay, Icon elements)? I am exploring the usage of ElementTree/lxml to enable this, but these seem to be more specific to XML files as opposed to KMLs. Here's what I'm trying to use to modify the KML, but it is unable to remove the Document element:

from lxml import etree
tree = etree.fromstring(open("C:\\Users\\mdl518\\Desktop\\aerial_extent.kml").read())
for item in tree.xpath("//Document[@id='1']"):
    item.getparent().remove(item)

print(etree.tostring(tree, pretty_print=True))

Here is the final desired output XML:

<?xml version="1.0" encoding="UTF-8"?>

<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
    <GroundOverlay>
         <name>Aerial Extent</name>
         <Icon>
             <href>C:\\Users\\mdl518\\Desktop\\aerial_image.png</href>
         </Icon>
         <LatLonBox>
              <north>46.55537</north>
              <south>46.53134</south>
              <east>48.60005</east>
              <west>48.57678</west>
              <rotation>0.090320</rotation>
         </LatLonBox>
    </GroundOverlay>
</kml>

Any insights are most appreciated!


Solution

  • You are getting tripped up on the dreaded namespaces...

    Try using something like this:

    ns = {'kml': 'http://www.opengis.net/kml/2.2'}
    for item in tree.xpath("//kml:Document[@id='1']",namespaces=ns):
        item.getparent().remove(item)
    

    Edit:

    To remove just the parent and retain all its descendants, try the following:

    retain = doc.xpath("//kml:Document[@id='1']/kml:GroundOverlay",namespaces=ns)[0]
    for item in doc.xpath("//kml:Document[@id='1']",namespaces=ns):
        anchor = item.getparent()
        anchor.remove(item)
        anchor.insert(1,retain)
    
    print(etree.tostring(doc, pretty_print=True).decode())
    

    This should get you the desired output.