Search code examples
pythonxmlelementtree

ElementTree cannot find one of the children elements


I have looked in other questions regarding this issue and none of them helped me. I'm parsing a XML with ElementTree and I am having problem finding a specific tag, which may be optional, while I can find other (optional) tags without issue.

Relevant XML snippet:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns4:ExportActualMedicines xmlns="urn:be:fgov:ehealth:samws:v2:actual:common" ... xmlns:ns4="urn:be:fgov:ehealth:samws:v2:export" xmlns:ns5="urn:be:fgov:ehealth:samws:v2:refdata" ... version="6.0" SamId="E.20240325_150005">
    <ns4:Amp code="SAM660978-00">
        <ns4:Ampp ctiExtended="660978-01">
            <ns4:Data from="2023-01-10" to="2023-01-31">
                <AuthorisationNr>HO-BE-UH660978</AuthorisationNr>
                <ParallelCircuit>0</ParallelCircuit>
                <PackDisplayValue>
                    ...
                </PackDisplayValue>
                <Status>AUTHORIZED</Status>
                ...
            </ns4:Data>
        </ns4:Ampp>
    </ns4:Amp>
</ns4:ExportActualMedicines>

Now, all of the tags under "ns4:Data" are being found without issues, but it doesn't seem to find "AuthorisationNr".

Relevant snippet of my Python code:

NS4 = '{urn:be:fgov:ehealth:samws:v2:export}'
XMLNS = '{urn:be:fgov:ehealth:samws:v2:actual:common}'

tree = ET.parse(file)
root = tree.getroot()
for amp in root.findall(f'{NS4}Amp')
    for item in amp:
        if item.tag == f'{NS4}Ampp':
           ampp = {'code': item.attrib['ctiExtended'],
                   'data': []}

           for elem in item:
               if elem.tag == f'{NS4}Data':
                   authorisation_number = elem.find(f'{XMLNS}AuthorisationNr')
                   parallel_circuit = elem.find(f'{XMLNS}ParallelCircuit')
                   pack_display_value = elem.find(f'{XMLNS}PackDisplayValue')
                   ampp['data'].append({
                     'from': elem.attrib['from'],
                     'to': elem.attrib['to'] if 'to' in elem.attrib else None,
                     'authorisation_number': authorisation_number if authorisation_number else None,
                     'pack_display_value': pack_display_value[0].text if pack_display_value else None,
                     'parallel_circuit': parallel_circuit.text if parallel_circuit else None
                     ...
                   })

When I simply iterate over all children elements of "ns4:Data", "AuthorisationNr" shows up.

if elem.tag == f'{NS4}Data':
    for data in elem:
        print(data)

---
Output:
<Element '{urn:be:fgov:ehealth:samws:v2:actual:common}AuthorisationNr' at 0x00000212DB3235B0>
...
<Element '{urn:be:fgov:ehealth:samws:v2:actual:common}ParallelCircuit' at 0x00000212DB323D80>
<Element '{urn:be:fgov:ehealth:samws:v2:actual:common}PackDisplayValue' at 0x00000212DB323E20>
...

I even tried copying pasting the element's full name inside the find() call, but it still couldn't find it. Anybody can help me?


Solution

  • Should be reachable by xpath directly from root:

    import xml.etree.ElementTree as ET
    
    xml_str="""<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <ns4:ExportActualMedicines xmlns="urn:be:fgov:ehealth:samws:v2:actual:common" xmlns:ns4="urn:be:fgov:ehealth:samws:v2:export" xmlns:ns5="urn:be:fgov:ehealth:samws:v2:refdata" version="6.0" SamId="E.20240325_150005">
        <ns4:Amp code="SAM660978-00">
            <ns4:Ampp ctiExtended="660978-01">
                <ns4:Data from="2023-01-10" to="2023-01-31">
                    <AuthorisationNr>HO-BE-UH660978</AuthorisationNr>
                    <ParallelCircuit>0</ParallelCircuit>
                    <PackDisplayValue>
                    </PackDisplayValue>
                    <Status>AUTHORIZED</Status>
                </ns4:Data>
            </ns4:Ampp>
        </ns4:Amp>
    </ns4:ExportActualMedicines>"""
    
    root = ET.fromstring(xml_str)
    
    authorisation_number = root.find('.//{urn:be:fgov:ehealth:samws:v2:actual:common}AuthorisationNr').text
    print(authorisation_number)
    

    Output:

    HO-BE-UH660978