Search code examples

pandas.read_xml() unexpected behaviour

I am trying to understand why the code:

import pandas

xml = '''
  <ELEM atr="anything">1</ELEM>
  <ELEM atr="anything">2</ELEM>
  <ELEM atr="anything">3</ELEM>
  <ELEM atr="anything">4</ELEM>
  <ELEM atr="anything">5</ELEM>
  <ELEM atr="anything">6</ELEM>
  <ELEM atr="anything">7</ELEM>
  <ELEM atr="anything">8</ELEM>
  <ELEM atr="anything">9</ELEM>
  <ELEM atr="anything">10</ELEM>
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')

... works as expected and prints:

        atr  ELEM
0  anything     1
1  anything     2
2  anything     3
3  anything     4
4  anything     5
5  anything     6
6  anything     7
7  anything     8
8  anything     9
9  anything    10

Yet the following code:

import pandas

xml = '''
df = pandas.read_xml(xml, xpath='/ROOT/ELEM')

results in the error:

ValueError: xpath does not return any nodes or attributes. Be sure to
specify in `xpath` the parent nodes of children and attributes to
parse. If document uses namespaces denoted with xmlns, be sure to
define namespaces and use them in xpath.

I have read the documentation here:

And also checked my xpath here (code above is just a minimal example, actual XML I use is more complex):

In a nutshell I need to read into pandas dataframe a list of XML child elements at a known xpath. Child elements have no attributes but all have text values. I want to get a dataframe with one column containing these valyes. What am I doing wrong?


  • If you check the documentation, pandas expects the XML to have rows with columns. In your first example, each <ELEM> is a row, and the atr is the column. In your second example, there are no columns. If you had <ELEM><VAL>1</VAL></ELEM>, it should work, because VAL would be the column.