I have 4 KML Files with multiple polygons. I would like to parse the KML files, extract the data and then store it into my Database. After researching, I figured that the best way to parse a KML file is to install pyKML.
One of my KML files looks like:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>RecAreaPolygons.TAB</name>
<Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
<SimpleField type="string" name="RecAreaName"><displayName><b>RecAreaName</b></displayName>
</SimpleField>
<SimpleField type="string" name="RecAreaCategory"><displayName><b>RecAreaCategory</b></displayName>
</SimpleField>
<SimpleField type="string" name="Province"><displayName><b>Province</b></displayName>
</SimpleField>
<SimpleField type="string" name="Comments"><displayName><b>Comments</b></displayName>
</SimpleField>
</Schema>
<Style id="style1">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<color>ff00ff00</color>
</PolyStyle>
</Style>
<Style id="falseColor">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<Folder id="layer 0">
<name>RecAreaPolygons</name>
<Placemark>
<name>Whistler</name>
<styleUrl>#falseColor</styleUrl>
<Style id="inline">
<IconStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</IconStyle>
<LineStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</PolyStyle>
</Style>
<ExtendedData>
<SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
<SimpleData name="RecAreaName">Whistler</SimpleData>
<SimpleData name="RecAreaCategory">World Class</SimpleData>
<SimpleData name="Province">BC</SimpleData>
<SimpleData name="Comments"></SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
//MULTIPLE OTHER PLACEMARKS
My attempt, as I mentioned was to install pyKML and after installing it, I ran the following code to store it into a dataframe:
with open('RecAreaPolygons.kml', 'rb') as f:
s = f.read()
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)
I'm able to print the first Placemark's Coordinates, but how do I receive the rest and iteratively add it to a dataframe?
Preferably, I'd want my output to look like:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
1 The rest of the entries
2
You can iterate over the placemarks, adding the name and geometry to a list. Then create a dataframe from the list.
If KML has multiple folders then you will need to iterate over the folders then placemarks in the folder.
from pykml import parser
import pandas as pd
with open('RecAreaPolygons.kml', 'r', encoding="utf-8") as f:
root = parser.parse(f).getroot()
places = []
for place in root.Document.Folder.Placemark:
data = {item.get("name"): item.text for item in
place.ExtendedData.SchemaData.SimpleData}
coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip()
data["Coordinates"] = coords
places.append(data)
df = pd.DataFrame(places)
print(df)
Output:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC None -123.052382,50.094969,0, -123.050613,50.07531...
If want the coords to be a list then add .split(' ')
to assignment of coords variable in the loop after the strip() call.