Python and XML beginner, so this may seem a little easy, but it challenges my assumtpions I am trying to parse an XML structure like this:
<variable ordernumber="175">
<name>Some_text</name>
<label>Label text</label>
<values>
<value code="5">Five</value>
<value code="4">Four</value>
<value code="3">Three</value>
<value code="2">Two</value>
<value code="1">One</value>
<value code="0">Zero</value>
</values>
</variable>
using minidom.
I am trying to extract the texts out of the name
, label
and field
elements:
import xml.dom.minidom as md
dom = md.parse(input_file)
root = dom.documentElement
for var in dom.getElementsByTagName('variable'):
var_name=var.getElementsByTagName('name')[0].firstChild.nodeValue
var_label=var.getElementsByTagName('label')[0].firstChild.nodeValue
var_values_list=var.getElementsByTagName('value')
for var_value in var_values_list:
print (var_name,var_label,var_values)
This is working fine, but there is one thing I do not understand:
Why isn't it possible to get the var_name
like this:
var_name=var.getElementsByTagName('name')[0].nodeValue
Why is the 'Some_text'
a child of <name\>
? Why isn't it the nodeValue
? What would be a nodeValue
in this context ?
Of course, the same goes for <label\>
and <value\>
It's bad design, but in the DOM, the nodeValue property of an element is null. See for example https://www.w3schools.com/jsref/prop_node_nodevalue.asp
There are many better-designed and more modern tree models for XML than DOM, but I don't know if there's anything available in the Python world.