Search code examples
pythonxmlminidom

Issue Parsing XML using Python Minidom


I am try to parse an XML file using python's xml minidom. I only want to return the first name <wd:First_Name> and last name <wd:Last_Name> of the legal name data <wd:Legal_Name_Data>. I do not want to return the first name or last name of the <wd:Preferred_Name_Data> data or any tertiary name data from each record. Below is an example of the XML file I am trying to parse. Of course, this is just one record of many from which I need to do retrieve this data.

<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schema.xmlsoap.org/soap/envelope/">
    <env:Body>
        <wd:Get_Working_Response xmlns:wd="urn:com.workway/bsvc"
                                 wd:version="v40.1">
            <wd:Request_Criteria>
                <wd:Transaction_Log_Criteria_Data>
                </wd:Transaction_Log_Criteria_Data>
                <wd:Field_And_Parameter_Criteria_Data>
                </wd:Field_And_Parameter_Criteria_Data>
                <wd:Eligibility_Criteria_Data>
                </wd:Eligibility_Criteria_Data>
            </wd:Request_Criteria>
            <wd:Response_Filter>
            </wd:Response_Filter>
            <wd:Response_Group>
            </wd:Response_Group>
            <wd:Response_Results>
            </wd:Response_Results>
            <wd:Response_Data>
                <wd:Worker>
                    <wd:Worker_Reference>
                        <wd:ID wd:type="WID">787878787878787</wd:ID>
                        <wd:ID wd:type="Employee_ID">123456</wd:ID>
                    </wd:Worker_Reference>
                    <wd:Worker_Descriptor>John Smith</wd:Worker_Descriptor>
                    <wd:Worker_Data>
                        <wd:Worker_ID>123456</wd:Worker_ID>
                        <wd:User_ID>jsmith</wd:User_ID>
                        <wd:Personal_Data>
                            <wd:Name_Data>
                                <wd:Legal_Name_Data>
                                    <wd:Name_Detail_Data wd:Formatted_Name="John J. Smith"
                                                         wd:Reporting_Name="John J. Smith">
                                        <wd:Country_Reference>
                                            <wd:ID wd:type="WID">89989898989989898</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">US</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Alpha-3_Code">USA</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Numeric-3_Code">000000</wd:ID>
                                        </wd:Country_Reference>
                                        <wd:First_Name>John</wd:First_Name>
                                        <wd:Middle_Name>J.</wd:Middle_Name>
                                        <wd:Last_Name>Smith</wd:Last_Name>
                                    </wd:Name_Detail_Data>
                                </wd:Legal_Name_Data>
                                <wd:Preferred_Name_Data>
                                    <wd:Name_Detail_Data wd:Formatted_Name="Johnny James Smith"
                                                         wd:Reporting_Name="Johnny James Smith">
                                        <wd:Country_Reference>
                                            <wd:ID wd:type="WID">89989898989989898</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">US</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Alpha-3_Code">USA</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Numeric-3_Code">000000</wd:ID>
                                        </wd:Country_Reference>
                                        <wd:First_Name>Johnny</wd:First_Name>
                                        <wd:Middle_Name>James</wd:Middle_Name>
                                        <wd:Last_Name>Smith</wd:Last_Name>
                                    </wd:Name_Detail_Data>
                                </wd:Preferred_Name_Data>
                                <wd:Additional_Name_Data>
                                    <wd:Name_Detail_Data wd:Formatted_Name="John J. Smith"
                                                         wd:Reporting_Name="John J. Smith">
                                        <wd:Country_Reference>
                                            <wd:ID wd:type="WID">89989898989989898</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">US</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Alpha-3_Code">USA</wd:ID>
                                            <wd:ID wd:type="ISO_3166-1_Numeric-3_Code">840</wd:ID>
                                        </wd:Country_Reference>
                                        <wd:First_Name>John</wd:First_Name>
                                        <wd:Middle_Name>J.</wd:Middle_Name>
                                        <wd:Last_Name>Smith</wd:Last_Name>
                                    </wd:Name_Detail_Data>
                                    <wd:Name_Type_Reference>
                                        <wd:ID wd:type="WID">89989898989989898</wd:ID>
                                        <wd:ID wd:type="Additional_Name_Type_ID">Preferred</wd:ID>
                                    </wd:Name_Type_Reference>
                                </wd:Additional_Name_Data>
                            </wd:Name_Data>

So far, I have tried the below, but it is returning all first name and last name data for each record. How can I specify the <wd:Legal_Name_Data>?

from xml.dom import minidom

doc = minidom.parse('myfile.xml')

firstlist =[]
lastlist=[]

first = doc.getElementsByTagName('wd:First_Name')
for name in first:
    first2 =name.firstChild.nodeValue
    firstlist.append(first2)

last = doc.getElementsByTagName('wd:Last_Name')
for lasts in last:
    last2 =lasts.firstChild.nodeValue
    lastlist.append(last2)

Thanks, Nick


Solution

  • You need to extract the legal names first, like this:

    firstlist = []
    lastlist = []
    
    legal_names = doc.getElementsByTagName('wd:Legal_Name_Data')
    for legal_name in legal_names:
        first = legal_name.getElementsByTagName('wd:First_Name')
        for name in first:
            first2 = name.firstChild.nodeValue
            firstlist.append(first2)
    
        last = legal_name.getElementsByTagName('wd:Last_Name')
        for lasts in last:
            last2 = lasts.firstChild.nodeValue
            lastlist.append(last2)