Search code examples
pythonpython-3.xpandasfilekml

Parsing a KML File and storing in a database with Python


I have 4 KML Files with multiple polygons. I would like to parse the KML files, extract the data and then store it into my Database. After researching, I figured that the best way to parse a KML file is to install pyKML.

One of my KML files looks like:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
    <name>RecAreaPolygons.TAB</name>
    <Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
        <SimpleField type="string" name="RecAreaName"><displayName>&lt;b&gt;RecAreaName&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="RecAreaCategory"><displayName>&lt;b&gt;RecAreaCategory&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="Province"><displayName>&lt;b&gt;Province&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="Comments"><displayName>&lt;b&gt;Comments&lt;/b&gt;</displayName>
</SimpleField>
    </Schema>
    <Style id="style1">
        <BalloonStyle>
            <text><![CDATA[<table border="0">
  <tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
  <tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
  <tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
  <tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
        </BalloonStyle>
        <PolyStyle>
            <color>ff00ff00</color>
        </PolyStyle>
    </Style>
    <Style id="falseColor">
        <BalloonStyle>
            <text><![CDATA[<table border="0">
  <tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
  <tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
  <tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
  <tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
        </BalloonStyle>
        <PolyStyle>
            <colorMode>random</colorMode>
        </PolyStyle>
    </Style>
    <Folder id="layer 0">
        <name>RecAreaPolygons</name>
        <Placemark>
            <name>Whistler</name>
            <styleUrl>#falseColor</styleUrl>
            <Style id="inline">
                <IconStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </IconStyle>
                <LineStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </LineStyle>
                <PolyStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </PolyStyle>
            </Style>
            <ExtendedData>
                <SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
                    <SimpleData name="RecAreaName">Whistler</SimpleData>
                    <SimpleData name="RecAreaCategory">World Class</SimpleData>
                    <SimpleData name="Province">BC</SimpleData>
                    <SimpleData name="Comments"></SimpleData>
                </SchemaData>
            </ExtendedData>
            <Polygon>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>
                            -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0 
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
        </Placemark>
//MULTIPLE OTHER PLACEMARKS


My attempt, as I mentioned was to install pyKML and after installing it, I ran the following code to store it into a dataframe:

with open('RecAreaPolygons.kml', 'rb') as f:
   s = f.read()
   
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)

I'm able to print the first Placemark's Coordinates, but how do I receive the rest and iteratively add it to a dataframe?


Preferably, I'd want my output to look like:

          RecAreaName  RecAreaCategory  Province  Comments  Coordinates  
0            Whistler      World Class        BC            -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0 
1                       The rest of the entries
2            

Solution

  • You can iterate over the placemarks, adding the name and geometry to a list. Then create a dataframe from the list.

    If KML has multiple folders then you will need to iterate over the folders then placemarks in the folder.

    from pykml import parser
    import pandas as pd
    
    with open('RecAreaPolygons.kml', 'r', encoding="utf-8") as f:
       root = parser.parse(f).getroot()
       
    places = []
    for place in root.Document.Folder.Placemark:
        data = {item.get("name"): item.text for item in
                place.ExtendedData.SchemaData.SimpleData}
        coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip()
        data["Coordinates"] = coords
        places.append(data)
    df = pd.DataFrame(places)
    print(df)
    

    Output:

      RecAreaName RecAreaCategory Province Comments  Coordinates
    0    Whistler     World Class       BC     None  -123.052382,50.094969,0, -123.050613,50.07531...
    

    If want the coords to be a list then add .split(' ') to assignment of coords variable in the loop after the strip() call.