Search code examples
pythonpandasxmldataframegpx

Reading a komoot xml file (gpx) with pandas


I want to read a xml file generated by komoot into a DataFrame. Here is the structure of the xml file:

<?xml version='1.0' encoding='UTF-8'?>
<gpx version="1.1" creator="https://www.komoot.de" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
  <metadata>
    <name>Title</name>
    <author>
      <link href="https://www.komoot.de">
        <text>komoot</text>
        <type>text/html</type>
      </link>
    </author>
  </metadata>
  <trk>
    <name>Title</name>
    <trkseg>
      <trkpt lat="60.126749" lon="4.250254">
        <ele>455.735013</ele>
        <time>2023-08-20T17:42:34.674Z</time>
      </trkpt>
      <trkpt lat="60.126580" lon="4.250247">
        <ele>455.735013</ele>
        <time>2023-08-20T17:42:36.695Z</time>
      </trkpt>
      <trkpt lat="60.126484" lon="4.250240">
        <ele>455.735013</ele>
        <time>2023-08-20T17:44:15.112Z</time>
      </trkpt>
    </trkseg>
  </trk>
</gpx>

I tried this code:

pd.read_xml('testfile.gpx',xpath='./gpx/trk/trkseg')

But somehow it seems there are problems with my xpath. Namely, I get this ValueError:

ValueError: xpath does not return any nodes. Be sure row level nodes are in xpath. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.

I tried a lot but no xpath I chose worked out.


Solution

  • Following the ValueError guidelines, you need to pass a namespace to read_xml :

    df = (
        pd.read_xml(
            "testfile.gpx",
            xpath=".//doc:trkseg/doc:trkpt",
            namespaces={"doc": "http://www.topografix.com/GPX/1/1"}
        )
    )
    

    Output :

    print(df)
    
             lat       lon         ele                      time
    0  60.126749  4.250254  455.735013  2023-08-20T17:42:34.674Z
    1  60.126580  4.250247  455.735013  2023-08-20T17:42:36.695Z
    2  60.126484  4.250240  455.735013  2023-08-20T17:44:15.112Z