Search code examples
pythonxmlparsingminidom

Python - trouble with nested variables in XML using minidom


I'm using Python (minidom) to parse an XML file and print the output like a csv file:

0.0,0.0,51.3,46.6,
49.9,49.0,51.0,46.6,
36.0,24.0,47.8,42.2,
51.0,46.6,49.3,34.1,

Instead, the program produces the following, skipping the 'Q' tags unless they are the last tag in their respective 'Event' tags... Why is my program failing to print these?

0.0,0.0,,46.6
49.9,49.0,,
36.0,24.0,42.2,
51.0,46.6,,

Here is the XML source file:

<?xml version="1.0" encoding="UTF-8"?>
<Games id = "1">
  <Game id="1" competition_id="1">
    <Event id="0" x="0.0" y="0.0">
      <Q id="a" end_x="51.3" />
      <Q id="b" end_y="46.6" />
    </Event>
    <Event id="1" x="49.9" y="49.0">
      <Q id="a" end_x="51.0" />
      <Q id="b" end_y="46.6" />
      <Q id="c" q1="tap" />
    </Event>
    <Event id="2" x="0.0" y="0.0">
      <Q id="a" end_y="47.8" />
      <Q id="b" end_x="42.2" />
    </Event>
    <Event id="3" x="51.0" y="46.6">
      <Q id="a" end_y="49.3" />
      <Q id="b" end_x="34.1" />
      <Q id="c" q1="17.8" />
    </Event>
</Game>

And here is my code:

from xml.dom.minidom import parse
import xml.dom.minidom

DOMTree = parse('myfile.xml');
collection = DOMTree.documentElement;

# Get all events in the collection
events = collection.getElementsByTagName("Event")

# Iterate through events
for event in events:
    start_x = event.getAttribute('x');
    start_y = event.getAttribute('y');

    qualifiers = event.getElementsByTagName('Q');

    # Iterate through qualifiers

    for qualifier in qualifiers:
        end_x= qualifier.getAttribute('end_x');
        end_y= qualifier.getAttribute('end_y');

    print start_x + ',' + start_y + ',' + end_x + ',' + end_y

Solution

  • If you look at the structure of your for qualifier in qualifiers loop, nothing happens to the end_x and end_y within the loop. That means that the code runs and reassigns the value for end_x and end_y. At the end of the loop, end_x and end_y are the values of the last qualifiers

    from xml.dom.minidom import parse
    import xml.dom.minidom
    
    DOMTree = parse('myfile.xml');
    collection = DOMTree.documentElement;
    
    # Get all events in the collection
    events = collection.getElementsByTagName("Event")
    
    # Iterate through events
    for event in events:
        start_x = event.getAttribute('x');
        start_y = event.getAttribute('y');
    
        qualifiers = event.getElementsByTagName('Q');
    
        # Iterate through qualifiers
    
        for qualifier in qualifiers:
            if(qualifier.hasAttribute('end_x')):
                end_x= qualifier.getAttribute('end_x');
            elif(qualifier.hasAttribute('end_y')):
                end_y= qualifier.getAttribute('end_y');
    
        print start_x + ',' + start_y + ',' + end_x + ',' + end_y
    

    The code above should do what you want. Something I noted is that originally the order was end_x then end_y then the order changed end_y end_x so in the sample correct format the end coordinates are flipped. So the output is

    0.0,0.0,51.3,46.6,
    49.9,49.0,51.0,46.6,
    36.0,24.0,42.2,47.8,
    51.0,46.6,34.1,49.3,
    

    Hope this helps