Search code examples
pythonparsinglxmlkmlpykml

Lxml error when parsing kml using pykml


Im trying to parse a kml file containing multiple placemarks using pykml. I want to edit the HTML code inside the kml's description, mainly for visualization purposes of geographic data in Google Earth. Ive researched a lot of ways to do so:

however, I always get the lxml error shown below. :(

    Traceback (most recent call last):
    File "C:\Users\Arellano\Copy\BSGE\2015-2016 SUMMER\trial7.py", line 5, in <module>
    root = parser.fromstring(open('trim_KML.kml', 'r').read())
  File "C:\Program Files (x86)\Python2.7.10\lib\site-packages\pykml-0.1.0-py2.7.egg\pykml\parser.py", line 41, in fromstring
    return objectify.fromstring(text)
  File "src/lxml/lxml.objectify.pyx", line 1801, in lxml.objectify.fromstring (src\lxml\lxml.objectify.c:25171)
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src\lxml\lxml.etree.c:77697)
  File "src/lxml/parser.pxi", line 1819, in lxml.etree._parseMemoryDocument (src\lxml\lxml.etree.c:116494)
  File "src/lxml/parser.pxi", line 1707, in lxml.etree._parseDoc (src\lxml\lxml.etree.c:115144)
  File "src/lxml/parser.pxi", line 1079, in lxml.etree._BaseParser._parseDoc (src\lxml\lxml.etree.c:109543)
  File "src/lxml/parser.pxi", line 573, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:103404)
  File "src/lxml/parser.pxi", line 683, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:105058)
  File "src/lxml/parser.pxi", line 613, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:103967)
XMLSyntaxError: Namespace prefix xsi for schemaLocation on Document is not defined, line 3, column 32

Heres my code snippet: (which is supposed to work based from one of my sources)

from pykml import parser

root = parser.fromstring(open('trim_KML.kml', 'r').read())
print etree.tostring(root.Document.Placemark.LineString.Description)

I have installed pykml and lxml 3.6.0 and Im currently using my Python 2.7.10. The kml file contains lines. (kml link: https://sites.google.com/site/kmlhostingmwss/trim.kml) I also have Python 2.7 from my ArcGIS 10.2.

Im new with working with kml files. Can someone please tell me what am I doing wrong? Or is there an easier way to edit the description of kml files? Thank you very much. :)))


Solution

  • The xml has some issues, if you want to remove the error, add xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" to the second line:

    <kml  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
    

    Then using lxml, the following works:

    import lxml.etree as et
    
    xml = et.parse("trim.kml").getroot()
    
    print(xml.xpath("//kml:Document//kml:Placemark/kml:description", namespaces={"kml":xml.nsmap["kml"]}))
    

    Which gives you:

    [<Element {http://www.opengis.net/kml/2.2}description at 0x7f612d0885f0>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088cb0>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088d40>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088d88>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088dd0>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088e18>]
    

    You could also use lxml.html which will work better with broken xml, the data itself is also 99 percent html.

    You can either get one from inside the document.placemark with:

    from lxml import html
    xml = html.parse("trim.kml")
    print(xml.xpath("//placemark/description"))
    

    Which gives you:

    [<Element description at 0x7f1c757fad08>, <Element description at 0x7f1c757fad60>, <Element description at 0x7f1c757fadb8>, <Element description at 0x7f1c757fae10>, <Element description at 0x7f1c757fae68>, <Element description at 0x7f1c757faec0>]