Search code examples
pythonxmlpython-2.7minidom

Python - understanding XML structure when parsing with minidom


Python and XML beginner, so this may seem a little easy, but it challenges my assumtpions I am trying to parse an XML structure like this:

<variable ordernumber="175">
        <name>Some_text</name>
        <label>Label text</label>
        <values>
          <value code="5">Five</value>
          <value code="4">Four</value>
          <value code="3">Three</value>
          <value code="2">Two</value>
          <value code="1">One</value>
          <value code="0">Zero</value>
        </values>
      </variable>

using minidom.

I am trying to extract the texts out of the name, label and field elements:

import xml.dom.minidom as md
dom = md.parse(input_file)
root = dom.documentElement
for var in dom.getElementsByTagName('variable'):
    var_name=var.getElementsByTagName('name')[0].firstChild.nodeValue
    var_label=var.getElementsByTagName('label')[0].firstChild.nodeValue
    var_values_list=var.getElementsByTagName('value')
    for var_value in var_values_list:
        print (var_name,var_label,var_values)

This is working fine, but there is one thing I do not understand: Why isn't it possible to get the var_name like this:

var_name=var.getElementsByTagName('name')[0].nodeValue

Why is the 'Some_text' a child of <name\> ? Why isn't it the nodeValue ? What would be a nodeValue in this context ? Of course, the same goes for <label\> and <value\>


Solution

  • It's bad design, but in the DOM, the nodeValue property of an element is null. See for example https://www.w3schools.com/jsref/prop_node_nodevalue.asp

    There are many better-designed and more modern tree models for XML than DOM, but I don't know if there's anything available in the Python world.