Search code examples
pythonxmlpython-3.xminidom

Python - Read an XML using minidom


I'm new in Python and I have a question. I'm trying to parse this xml (this XML has several information, this is the first data what I need to read):

<![CDATA[<?xml version="1.0" encoding="UTF-8"?><UDSObjectList>
<UDSObject>
<Handle>cr:908715</Handle>
<Attributes>
<Attribute DataType="2002">
<AttrName>ref_num</AttrName>
<AttrValue>497131</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>support_lev.sym</AttrName>
<AttrValue/>
</Attribute>
<Attribute DataType="2004">
<AttrName>open_date</AttrName>
<AttrValue>1516290907</AttrValue>
</Attribute>
<Attribute DataType="58814636">
<AttrName>agt.id</AttrName>
<AttrValue/>
</Attribute>
<Attribute DataType="2005">
<AttrName>priority</AttrName>
<AttrValue>3</AttrValue>
</Attribute>
<Attribute DataType="2009">
<AttrName>tenant.id</AttrName>
<AttrValue>F3CA8B5A2A456742B21EF8F3B5538623</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>tenant.name</AttrName>
<AttrValue>Ripley</AttrValue>
</Attribute>
<Attribute DataType="2005">
<AttrName>log_agent</AttrName>
<AttrValue>088966043F4D2944AA90067C52DA454F</AttrValue>
</Attribute>
<Attribute DataType="58826268">
<AttrName>request_by.first_name</AttrName>
<AttrValue/>
</Attribute>
<Attribute DataType="58826268">
<AttrName>request_by.first_name</AttrName>
<AttrValue/>
</Attribute>
<Attribute DataType="2002">
<AttrName>customer.first_name</AttrName>
<AttrValue>Juan Guillermo</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>customer.last_name</AttrName>
<AttrValue>Mendoza Montero</AttrValue>
</Attribute>
<Attribute DataType="2009">
<AttrName>customer.id</AttrName>
<AttrValue>8C020EBAD32035419D7654CDE510D312</AttrValue>
</Attribute>
<Attribute DataType="2001">
<AttrName>category.id</AttrName>
<AttrValue>1121021012</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>category.sym</AttrName>
<AttrValue>Ripley.Sistemas Financieros.Terminal Financiero.Mensaje de 
 Error</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>status.sym</AttrName>
<AttrValue>Suspended</AttrValue>
</Attribute>
<Attribute DataType="2009">
<AttrName>group.id</AttrName>
<AttrValue>099621F7BD77C545B65FB65BFE466550</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>group.last_name</AttrName>
<AttrValue>EUS_Zona V Region</AttrValue>
</Attribute>
<Attribute DataType="2001">
<AttrName>zreporting_met.id</AttrName>
<AttrValue>7300</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>zreporting_met.sym</AttrName>
<AttrValue>E-Mail</AttrValue>
</Attribute>
<Attribute DataType="2002">
<AttrName>assignee.combo_name</AttrName>
<AttrValue/>
</Attribute>
<Attribute DataType="2004">
<AttrName>open_date</AttrName>
<AttrValue>1516290907</AttrValue>
</Attribute>
<Attribute DataType="2004">
<AttrName>close_date</AttrName>
<AttrValue/>
</Attribute>
<Attribute DataType="2002">
<AttrName>description</AttrName>
<AttrValue>Asunto       :Valaparaiso / Terminal Financiero Error
 Nombre Completo    :JUAN MENDOZA MONTERO
 Ubicación  :CCSS VALPARAISO Plaza victoria 1646, VALPARAISO
 País       :Chile
 Telefono   :ANEXO 2541
 Correo     :jmendozam@ripley.cl
 Descripción    :Error Terminal Financiero
 Descartes  :N/A</AttrValue>
 </Attribute>
 <Attribute DataType="2002">
 <AttrName>summary</AttrName>
 <AttrValue>Santiago / Modificación </AttrValue>
 </Attribute>
 </Attributes>
 </UDSObject>

but when I read the file with this method:

from zeep import Client
import xml.dom.minidom
from xml.dom.minidom import Node

def select():
resultado = []
sid = _client.service.login("User","password")
objectType = 'cr'
whereClause = "group.last_name LIKE 'EUS_ZONA%' AND open_date > 1517454000 
AND open_date < 
1519786800"
maxRows = -1
attributes = ["ref_num"
      ,"agt.id"
      ,"priority"
      ,"pcat.id"
      ,"tenant.id"
      ,"tenant.name"
      ,"log_agent"
      ,"request_by.first_name"
      ,"request_by.last_name"
      ,"customer.first_name"
      ,"customer.last_name"
      ,"customer.id"
      ,"category.id"
      ,"category.sym"
      ,"status.sym"
      ,"group.id"
      ,"group.last_name"
      ,"zreporting_met.id"
      ,"zreporting_met.sym"
      ,"assignee.combo_name"
      ,"open_date"
      ,"close_date"
      ,"description"
      ,"summary"]
minim = _client.service.doSelect(sid=sid, objectType=objectType, 
whereClause=whereClause, maxRows= maxRows, attributes= attributes)
dom = xml.dom.minidom.parseString(minim)
nodeList = dom.getElementsByTagName('AttrValue')
for j in range(len(nodeList)):
    resultado.append(dom.getElementsByTagName('AttrValue')[j].firstChild.wholeText)
    print(resultado[j])

logout = _client.service.logout(sid)

This only print the first AttrValue (ref_num value), what I need to do is add every field of the XML file in resultado array, I need help to print every field from the XML file, someone can help me to that?


Solution

  • Please read and follow How to create a Minimal, Complete, and Verifiable example. You should remove all the server stuff and reduce the size of your sample data.


    This snippet follows your code and gets all attribute elements and then iterates those:

    import xml.dom.minidom
    from xml.dom.minidom import Node
    
    minim = """<?xml version="1.0" encoding="UTF-8"?>
    <udsobjectlist>
        <udsobject>
            <handle>cr:908715</handle>
            <attributes>
                <attribute datatype="2002">
                    <attrname>ref_num</attrname>
                    <attrvalue>497131</attrvalue>
                </attribute>
                <attribute datatype="2002">
                    <attrname>support_lev.sym</attrname>
                    <attrvalue/>
                </attribute>
                <attribute datatype="2004">
                    <attrname>open_date</attrname>
                    <attrvalue>1516290907</attrvalue>
                </attribute>
            </attributes>
        </udsobject>
    </udsobjectlist>
    """
    
    dom = xml.dom.minidom.parseString(minim)
    nodeList = dom.getElementsByTagName('attribute')
    
    resultado = []
    attributes = ["attrname", "attrvalue"]
    for node in nodeList:
        a = []
        for attribute in attributes:
            try:
                a.append( node.getElementsByTagName(attribute)[0].firstChild.wholeText)
            except AttributeError:
                a.append("")
        resultado.append(a)
    print(resultado)
    

    prints

    [['ref_num', '497131'], ['support_lev.sym', ''], ['open_date', '1516290907']]
    

    Even closer to your code:

    nodeList = dom.getElementsByTagName('attrvalue')
    for node in nodeList:
        try:
            v = node.firstChild.wholeText
            resultado.append(v)
            print(v)
        except:
            pass
    print(resultado)
    

    prints

    497131
    1516290907
    ['497131', '1516290907']
    

    As suggested in the comments, with ET (although you probably should not access elements by index, but this might get you started):

    import xml.etree.ElementTree as ET
    root = ET.fromstring(minim)
    
    for child in root[0][1]:
        try:
            print(child[0].text)
            print(child[1].text)
        except:
            pass
    

    prints

    ref_num
    497131
    support_lev.sym
    None
    open_date
    1516290907