I am having difficulties parsing the layers of this KML file in R and Python. I have included a link to download the file from my Dropbox. This file was shared with me oringinally. However, I am being told the file originates at Distilleries Fighting Covid, but I couldn't figure out how to find it or get to it.
What I am wanting is to extract all layers and ultimately separate them into their own csv
files. The nodes that I am wanting to retrieve are Name, Address, City, State, Zip. The closest that I have gotten with this is from the stack post Read multiple layers of KML file using R.
For this first attempt, my code looks as follows:
library(rgdal)
allKmlLayers <- function(kmlfile){
lyr <- ogrListLayers(kmlfile)
mykml <- list()
for (i in 1:length(lyr)){
mykml[i] <- readOGR(kmlfile, lyr[i])
}
names(mykml) <- lyr
return(mykml)
}
kmlfile <- "Distilleries and Hospitals.kml"
mykml <- allKmlLayers(kmlfile)
However, when doing so, I am getting the following error and warning:
Error in readOGR("Distilleries and Hospitals.kml", "Distilleries") :
no features found In addition: Warning message: In ogrFIDs(dsn = dsn, layer = layer) : no features found
Now, I am able to read the layers stored in the lyr variable.
The code below will produce a list of 7.
lyr <- ogrListLayers("Distilleries and Hospitals.kml")
Next, I tried to just pull from he one layer with the following code:
mykml <- readOGR("Distilleries and Hospitals.kml", "Distilleries")
This resulted in the following error and warning (same as above):
Error in readOGR("Distilleries and Hospitals.kml", "Distilleries") :
no features found In addition: Warning message: In ogrFIDs(dsn = dsn, layer = layer) : no features found
Finally, I tried to use a similar approach with the lapply
using the sf
package.
library(sf)
kmlfile <- "Distilleries and Hospitals.kml"
mykml <- lapply(lyr, function(i) st_read(kmlfile, i))
names(mykml) <- lyr
I get 7 0x3 lists with no information.
Any assistance with this would be wonderful.
One final note, if you do end up getting the file from the website instead, please note that there are several instances near the end of the file where R won't read the file (at least not for me) because of special characters. The error will tell you where this is when using the sf function.
Thank you for your time on this.
KML File at Dropbox for Download (~28mb)
Edit 1: From a comment left below, it seems that the layers are empty in this file. If that is accurate, then the question is, how would I get the data I need out of this file and into a CSV file.
Edit 2:
Further Investigation the KML document it appears that all of my information may be found within the placemark
tags (...). However, I am not certain how to pull that data out. This is the ultimate goal. If these are not layers, then it would be great if someone could point me in the direction to solving this. Again, I want to thank you in advance for all of your help.
Edit 3 Data Excerpt and Python Attempt: I have manually manipulated the file to remove everything that I am not really interested in having in the long run. Below is a small excerpt of the file. It lists the first three companies.
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Folder>
<name>Distilleries</name>
<Placemark>
<name>Bomb City Enterprises</name>
<description><![CDATA[Address: 306 S Cleveland St<br>Address Line2: <br>City: Amarillo<br>Location: Alabama<br>State_Abbrev: AL<br>Postal Code: 79102<br>unnamed (1): <br>unnamed (2): <br>unnamed (3): <br>Updated 2020-04-12 20:30:13.383810: ]]></description>
<ExtendedData>
<Data name="Address">
<value>306 S Cleveland St</value>
</Data>
<Data name="Address Line2">
<value/>
</Data>
<Data name="City">
<value>Amarillo</value>
</Data>
<Data name="Location">
<value>Alabama</value>
</Data>
<Data name="State_Abbrev">
<value>AL</value>
</Data>
<Data name="Postal Code">
<value>79102</value>
</Data>
<Data name="unnamed (1)">
<value/>
</Data>
<Data name="unnamed (2)">
<value/>
</Data>
<Data name="unnamed (3)">
<value/>
</Data>
<Data name="Updated 2020-04-12 20:30:13.383810">
<value/>
</Data>
</ExtendedData>
</Placemark>
<Placemark>
<name>Cahaba Brewing Company</name>
<address>4500 5th Ave. S building C Birmingham Alabama AL 35222</address>
<description><![CDATA[Address: 4500 5th Ave. S<br>Address Line2: building C<br>City: Birmingham<br>Location: Alabama<br>State_Abbrev: AL<br>Postal Code: 35222<br>unnamed (1): <br>unnamed (2): <br>unnamed (3): <br>Updated 2020-04-12 20:30:13.383810: ]]></description>
<styleUrl>#icon-1517-0288D1</styleUrl>
<ExtendedData>
<Data name="Address">
<value>4500 5th Ave. S</value>
</Data>
<Data name="Address Line2">
<value>building C</value>
</Data>
<Data name="City">
<value>Birmingham</value>
</Data>
<Data name="Location">
<value>Alabama</value>
</Data>
<Data name="State_Abbrev">
<value>AL</value>
</Data>
<Data name="Postal Code">
<value>35222</value>
</Data>
<Data name="unnamed (1)">
<value/>
</Data>
<Data name="unnamed (2)">
<value/>
</Data>
<Data name="unnamed (3)">
<value/>
</Data>
<Data name="Updated 2020-04-12 20:30:13.383810">
<value/>
</Data>
</ExtendedData>
</Placemark>
<Placemark>
<name>Redmont Distilling Company</name>
<address>4550 5th Ave South building N Birmingham Alabama AL 35222</address>
<description><![CDATA[Address: 4550 5th Ave South<br>Address Line2: building N<br>City: Birmingham<br>Location: Alabama<br>State_Abbrev: AL<br>Postal Code: 35222<br>unnamed (1): <br>unnamed (2): <br>unnamed (3): <br>Updated 2020-04-12 20:30:13.383810: ]]></description>
<styleUrl>#icon-1517-0288D1</styleUrl>
<ExtendedData>
<Data name="Address">
<value>4550 5th Ave South</value>
</Data>
<Data name="Address Line2">
<value>building N</value>
</Data>
<Data name="City">
<value>Birmingham</value>
</Data>
<Data name="Location">
<value>Alabama</value>
</Data>
<Data name="State_Abbrev">
<value>AL</value>
</Data>
<Data name="Postal Code">
<value>35222</value>
</Data>
<Data name="unnamed (1)">
<value/>
</Data>
<Data name="unnamed (2)">
<value/>
</Data>
<Data name="unnamed (3)">
<value/>
</Data>
<Data name="Updated 2020-04-12 20:30:13.383810">
<value/>
</Data>
</ExtendedData>
</Placemark>
<Placemark>
Since I have had no luck with R, I have added my Python attempt below. I am hoping. However, with the added data, if someone is able to do this in R, I will be happy with that as well.
What I am trying to get is first the name. Then from the extended data section, I am looking ultimatly to get Address 1, Address 2, City, State Abbreviation, and Zip. I am fine if I end up with everything so long as it puts an empty field where there is no data. For example, Address 2 is often empty, just return an empty field and keep moving so that when I merge the lists, everything lines up.
The example below only attempts to get Name and Address Line 1. I figure, if I can get this, then I should be able to extend it all the way.
The additional code that I have tried is below:
import xml.etree.ElementTree as et
doc = et.parse(filename)
nmsp = '{http://www.opengis.net/kml/2.2}'
name = []
address1 = []
for pm in doc.iterfind('.//{0}Placemark'.format(nmsp)):
print(pm.find('{0}name'.format(nmsp)).text)
name.append(pm.find('{0}name'.format(nmsp)).text)
for adr1 in pm.iterfind('{0}ExtendedData//{0}value'.format(nmsp)):
address1.append(adr1.text.strip().replace('\n',''))
print(adr1.text.strip().replace('\n',''))
When I run this, I get the first record with the first address line 1 perfectly, but I also get the following error:
AttributeError: 'NoneType' object has no attribute 'strip'
I believe that this is because in the first record, Address 2 is empty. Therefore, I believe that this is trying actually pull everything at once from the extended Data which is also not what I want.
The real difficulty I am having is pulling the <Data name = "..."> ... </Data>
fields.
This is my first crack at XML/KML parsing, so any help I would greatly appreciate. I really have not a clue what to try next at this point.
End file will be a CSV file with headers: Name, Address 1, Address 2, City, State, Zip. Honestly, I am also fine just getting rid of Address 2 as well. It's not critical to have.
If you need further clarification, please just ask. Thank you in advance for your time.
Since KML files are XML files, consider XSLT, the special purpose language designed to transform XML files to different XML, HTML, even CSV formats.
Both Python with lxml
and R with xslt
(extended package to xml2
) modules can run XSLT 1.0 scripts.
XSLT (save as .xsl, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:doc="http://www.opengis.net/kml/2.2">
<xsl:output indent="yes" method="text" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/doc:kml">
<xsl:copy>
<xsl:text>Name,Address 1,Address 2,City,State,Zip
</xsl:text>
<xsl:apply-templates select="descendant::doc:Placemark"/>
</xsl:copy>
</xsl:template>
<xsl:template match="doc:Placemark">
<xsl:copy>
<xsl:value-of select="concat(doc:name, ',',
doc:ExtendedData/doc:Data[@name='Address'], ',',
doc:ExtendedData/doc:Data[@name='Address Line2'], ',',
doc:ExtendedData/doc:Data[@name='City'], ',',
doc:ExtendedData/doc:Data[@name='Location'], ',',
doc:ExtendedData/doc:Data[@name='Postal Code'])"/>
<xsl:text>
</xsl:text>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python
import lxml.etree as et
# INPUT XML AND XSL SOURCES
xml = et.parse('/path/to/Input.kml')
xsl = et.parse('/path/to/Script.xsl')
# RUN TRANSFORMATION
transformer = et.XSLT(xsl)
new_xml = transformer(xml)
# PRINT TO CONSOLE
print(new_xml)
# Name,Address 1,Address 2,City,State,Zip
# Bomb City Enterprises,306 S Cleveland St,,Amarillo,Alabama,79102
# Cahaba Brewing Company,4500 5th Ave. S,building C,Birmingham,Alabama,35222
# Redmont Distilling Company,4550 5th Ave South,building N,Birmingham,Alabama,35222
# SAVE TO FILE
with open('/path/to/Output.csv', 'wb') as f:
f.write(new_xml)
R
library(xml2)
library(xslt)
# PARSE XML AND XSLT
doc <- read_xml('/path/toInput.kml')
style <- read_xml('/path/to/Script.xsl', package = "xslt")
# TRANSFORM NESTED INPUT INTO FLATTER OUTPUT
new_xml <- xslt::xml_xslt(doc, style)
# SAVE CSV
f <- file("/path/to/Output.csv")
writeLines(new_xml, f)
close(f)
# BUILD DATA FRAME
final_df <- read.csv('/path/to/Output.csv')
# Name Address.1 Address.2 City State Zip
# 1 Bomb City Enterprises 306 S Cleveland St Amarillo Alabama 79102
# 2 Cahaba Brewing Company 4500 5th Ave. S building C Birmingham Alabama 35222
# 3 Redmont Distilling Company 4550 5th Ave South building N Birmingham Alabama 35222