Search code examples
pythonxml-parsingelementtree

Can't get find to access element in XPath in python xml.etree.ElementTree


I've been going in circles. My task is to read in an xml document. Parse the record to find an ID. Use the ID to execute some SQL. Then compare the value of the xml to the values of the SQL. If the SQL is different (and not null), we update the XML to be sent back to another server.

All is working except that when I am trying to get the value of a field in an XML record, it is not finding.

I have create two dictionaries of xml mapping:

xml_sql_mapping_sa = {
    'ncaaId': 'RZECAST_KEYNCAAID',
    'schoolSid': 'SPRIDEN_ID',
    'birthDate': 'BIRTHDATE',
    'email': 'EMAIL',
    'ethnicCode': 'ETHNICODE',
    'firstName': 'FIRSTNAME',
    'lastName': 'LASTNAME',
    'MI': 'MI',
    'gender': 'GENDER',
    'primaryPhone': 'PRIMARY_PHONE'
}

# Define a mapping between XML fields and SQL columns for <parent> elements
xml_sql_mapping_parent = {
    'email': 'PARENT_EMAIL',
    'primaryPhone': 'PARENT_PHONE'
}

Here is the structure of the XML:

<students>
<sa birthDate="2005-##-####" email="[email protected]" ethnicCode="5" firstName="L____" gender="F" lastName="A____d" ncaaId="211123456908" primaryPhone="208-555-5555" schoolSid="020111126">
<saDetail fulltimeEnrollmentTermAny="S1" fulltimeEnrollmentTermHere="S1" fulltimeEnrollmentYearAny="2024" fulltimeEnrollmentYearHere="2024" internationalFlag="N"/>
<address address1="678 Address" city="theCity" country="US" postalCode="83204" state="ID"/>
<parent email="[email protected]" name="first Name" primaryPhone="25555541522"/>
<saPreFte hoursCode="ADVANCED_PLACEMENT"/>
<saPreFte degreeApplicableHours="27.0" earnedHours="27.0" hoursCode="CREDIT_BEFORE_FULL_TIME"/>
<saPreFte hoursCode="CREDIT_BY_EXAM"/>
<saPreFte hoursCode="SUMMER_BRIDGE"/>
<saYear academicYear="2024">
<saYearSport sportCode="WGO"/>
<saYearEligible financialAidCertDate="2023-08-09" medicalDate="2023-08-08"/>
<saYearTerm termCode="S1"/>
<saYearTerm termCode="S2"/>
<saYearTerm termCode="SU"/>
<saYearPtd classYear="1"/>
</saYear>
</sa>
</students>

Here is the relevant code:

            row = cursor.fetchone()

            # Check if there are rows to update XML
            if row:
                # Get the column names from cursor description
                column_names = [desc[0] for desc in cursor.description]

                # Create a dictionary to map column names to values (uppercase column names)
                row_data = dict(zip(map(str.upper, column_names), row))
                print(row_data)

                # Determine whether to update <sa> or <parent> elements based on tag
                if record.tag == 'sa':
                    xml_sql_mapping = xml_sql_mapping_sa
                elif record.tag == 'parent':
                    xml_sql_mapping = xml_sql_mapping_parent
                else:
                    print(f"Unsupported XML element tag: {record.tag}")
                    continue

                # Loop through the XML fields and update if necessary
                for xml_field, sql_column in xml_sql_mapping.items():
                    if sql_column in row_data:
                        sql_value = row_data[sql_column]
                        xml_element = record.find(xml_field)
                        print(f"The xml_field is {xml_field}.")
                        print(f"XML element is {xml_element}")

                        # Check if the XML element exists
                        if xml_element is not None:
                            # Check if the SQL value is not None and different from XML value
                            if sql_value is not None and xml_element.text != sql_value:
                                xml_element.text = sql_value
                        else:
                            print(f"XML element {xml_field} not found in the <{record.tag}> element.")

            else:
                print(f"No record found for student ID: {v_student_ID}")

        finally:
            # Close the cursor for each student
            cursor.close()

Everything works, except that it doesn't find the xml element. Here is a portion of my print statements. (data is changed)

SchoolSid: 020194486
{'RZECAST_KEYNCAAID': '151104', 'SPRIDEN_ID': '123456', 'BIRTHDATE': '2004-05-23', 'EMAIL': '[email protected]', 'ETHNICODE': '5', 'FIRSTNAME': 'A____', 'LASTNAME': 'A___', 'MI': None, 'GENDER': 'F', 'PREFERREDNAME': None, 'PRIMARY_PHONE': '406-555-5555'}
The xml_field is ncaaId.
XML element is None
XML element ncaaId not found in the <sa> element.
The xml_field is schoolSid.
XML element is None 

The relevant line of code is this: xml_element = record.find(xml_field)

xml_element should populate with the value from the xml, but it finds nothing.


Solution

  • The xml_field is ncaaId.
    XML element is None
    XML element ncaaId not found in the <sa> element.
    

    ncaaId isn't an element. It's an attribute of the sa element.

    So you'd probably want to do something like (maybe rename the variable too since you're not selecting an element):

    xml_element = record.get(xml_field)