I'm learning how to parse KML files in Python using the pyKML module. The specific file I'm using can be found here and I've also added it at the bottom of this post. I have saved the file on my computer and name it test.kml
.
After some research, I managed to extract a specific portion of the test.kml
file and save the result to a DataFrame. Here's my code:
from pykml import parser
import pandas as pd
filename = 'test.kml'
with open(filename) as fobj:
folder = parser.parse(fobj).getroot().Document
plnm = []
for pm in folder.Placemark:
plnm1 = pm.name
plnm.append(plnm1.text)
df = pd.DataFrame()
df['name'] = plnm
print(df)
name
0 Club house
1 By the lake
I would like to add a new column to my DataFrame corresponding to the value of the "holeNumber"
. I have tried to add the following lines in my for
loop but without success.
for pm in folder.Placemark:
plnm1 = pm.name
val1 = pm.ExtendedData.holeNumber.value
plnm.append(plnm1.text)
val.append(val1.text)
I'm not sure how to access the value from that specific node. The resulting DataFrame I'm looking for is the following:
| name | holeNumber |
|-------------|------------|
| Club house | 1 |
| By the lake | 5 |
Any help would be appreciated.
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<name>My Golf Course Example</name>
<Placemark>
<name>Club house</name>
<ExtendedData>
<Data name="holeNumber">
<value>1</value>
</Data>
<Data name="holeYardage">
<value>234</value>
</Data>
<Data name="holePar">
<value>4</value>
</Data>
</ExtendedData>
<Point>
<coordinates>-111.956,33.5043</coordinates>
</Point>
</Placemark>
<Placemark>
<name>By the lake</name>
<ExtendedData>
<Data name="holeNumber">
<value>5</value>
</Data>
<Data name="holeYardage">
<value>523</value>
</Data>
<Data name="holePar">
<value>5</value>
</Data>
</ExtendedData>
<Point>
<coordinates>-111.95,33.5024</coordinates>
</Point>
</Placemark>
</Document>
</kml>
Here's a quick way to parse the KML.
plnm = []
holeNumber = []
for pm in folder.Placemark:
plnm1 = pm.name
val1 = pm.ExtendedData.Data[0].value
plnm.append(plnm1.text)
holeNumber.append(val1.text)
df = pd.DataFrame()
df['name'] = plnm
df['holeNumber'] = holeNumber
print(df)
Or
df = pd.DataFrame(columns=('name', 'holeNumber'))
for pm in folder.Placemark:
name = pm.name.text
value = pm.ExtendedData.Data[0].value.text
df = df.append({ 'name' : name, 'holeNumber' : value }, ignore_index=True)
print(df)
Output:
name holeNumber
0 Club house 1
1 By the lake 5