Search code examples
pythonxmlminidomparsexml

Parse xml element into list in Python


I've been trying for days to store the list of coordinates as a list in python. I've tried minidom, xmlreader, and a few others. Obviously I'm doing something wrong, but now I'm on a deadline and need some help.

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
    <name>KmlFile</name>
    <Style id="s_ylw-pushpin_hl">
        <IconStyle>
            <scale>1.3</scale>
            <Icon>
                <href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
            </Icon>
            <hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
        </IconStyle>
    </Style>
    <StyleMap id="m_ylw-pushpin">
        <Pair>
            <key>normal</key>
            <styleUrl>#s_ylw-pushpin</styleUrl>
        </Pair>
        <Pair>
            <key>highlight</key>
            <styleUrl>#s_ylw-pushpin_hl</styleUrl>
        </Pair>
    </StyleMap>
    <Style id="s_ylw-pushpin">
        <IconStyle>
            <scale>1.1</scale>
            <Icon>
                <href>http://maps.google.com/mapfiles/kml/pushpin/ylw-pushpin.png</href>
            </Icon>
            <hotSpot x="20" y="2" xunits="pixels" yunits="pixels"/>
        </IconStyle>
    </Style>
    <Placemark>
        <name>Untitled Path</name>
        <styleUrl>#m_ylw-pushpin</styleUrl>
        <LineString>
            <tessellate>1</tessellate>
            <coordinates>
                -99.96592053692414,35.92662037784583,0 -99.96540056429473,35.92663981781373,0 -99.96498447443297,35.92665534578857,0 -99.96454642236132,35.9264185376019,0 -99.96402346559078,35.9263549197269,0 -99.96360918813228,35.92637088259819,0 -99.96308396826463,35.92638887185453,0 -99.96266233713877,35.92640278086895,0 -99.9620357745397,35.92642517810427,0 -99.96151465912494,35.92644409356279,0 -99.96109783080338,35.92645922131123,0 -99.96047099900818,35.92648135188922,0 -99.95994680842655,35.92649909796792,0 -99.95952930454531,35.9265138434105,0 -99.95911612668664,35.92653010087101,0 -99.95849547509812,35.92655424217563,0 -99.95797733772554,35.92657408147154,0 -99.95756296901538,35.9265900399819,0 -99.95714851772037,35.92660600192025,0 -99.95663020793583,35.92662589771953,0 -99.95621554016822,35.92664185272223,0 -99.95569708921856,35.9266617957949,0 -99.9458971751307,35.92652862005821,0 -99.93713210910761,35.92660840326551,0 -99.9348524336848,35.92677309501163,0 -99.93476503004912,35.93623300599234,0 -99.93485273755309,35.93831642341647,0 -99.93486826790806,35.93875635918398,0 -99.93488694286367,35.93928521294632,0 -99.93491814461646,35.94016891184617,0 -99.93483463865006,35.94079068185483,0 -99.93474529945293,35.94123501583059,0 -99.93476308407948,35.94167526964939,0 -99.93467039232556,35.94203189098033,0 -99.93468833773247,35.94247270846275,0 -99.93459097919218,35.94274358478956,0 -99.93450373314268,35.94328089340114,0 -99.93453485381977,35.94407983030283,0 -99.93455562086309,35.94461349899734,0 -99.93445839976054,35.94488509983607,0 -99.93448432730901,35.9455054713127,0 -99.93450287538879,35.94594914852631,0 -99.93451303722561,35.94621777016132,0 -99.93452308120096,35.94648690425281,0 -99.93454990776108,35.94720567931162,0 -99.93457547504839,35.94792922487444,0 -99.93458816303897,35.94829196348598,0 -99.93460086921257,35.94865516133368,0 -99.93461358979009,35.94901881052979,0 -99.93462632401268,35.94938292352997,0 -99.93465481251259,35.95010474750824,0 -99.93466639323499,35.95037342476436,0 -99.93468240925189,35.95083051103168,0 -99.9346984507018,35.95128832348286,0 -99.93460267632484,35.9516594253745,0 -99.93450677278406,35.95203100714456,0 -99.93429881280559,35.9523154456587,0 -99.93431336603591,35.95277901025096,0 -99.93433087184599,35.95333638334299,0 -99.93434841932388,35.95389496314264,0 -99.93436307001966,35.95436136837495,0 -99.93437774762351,35.95482861518102,0 -99.93406525480461,35.95530700961358,0 -99.93363184570134,35.95541444996656,0 -99.93320107293883,35.95561569086011,0 -99.93277268551296,35.95591099220704,0 -99.93233870660956,35.95601871006549,0 -99.93190710818399,35.95622047086368,0 -99.93147112384854,35.95633250542997,0 -99.93125396544431,35.95643630241383,0
            </coordinates>
        </LineString>
    </Placemark>
</Document>
</kml>

This is what I have right now.

 import xml.dom.minidom
 from xml.dom.minidom import Node    
 def get_coord_list_from_earth(filename):

    doc = xml.dom.minidom.parse(filename)

    for node in doc.getElementsByTagName("LineString"):
        coord_list = node.getElementsByName("coordinates")

    return coord_list

Solution

  • The idea is to find coordinates tag, get it's value, strip the value and split it by ,:

    from xml.dom import minidom
    
    data = """Your xml goes here"""
    
    doc = minidom.parseString(data)
    coordinates =  doc.getElementsByTagName('coordinates')[0].firstChild.nodeValue
    coordinates = coordinates.strip()
    
    print coordinates.split(',')
    

    prints:

    [u'-99.96592053692414', u'35.92662037784583', ..., u'35.95643630241383', u'0']
    

    Hope that helps.