Search code examples
pythonpython-3.xxmlxml.etree

XML ElementTree Python: Find all the relations of a node


If we supose the following XML file:

<XML Data>
    <Record>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="C"></Product>
        </Service>
    </Record>
    <Record>
        <Service>
            <Product id="A"></Product>
            <Product id="B"></Product>
            <Product id="Y"></Product>
        </Service>
    </Record>
    <Record>
        <Service>
            <Product id="U"></Product>
        </Service>
    </Record>
</XML Data>

As you can see, each record shows a single client but without an unique identificator. Each service has multiple products.

I want to get all products that have been sold with product A. Therefore, I am trying to get a list like this:

ServiceID
B
C
Y

I've been using:

import xml.etree.ElementTree as ET

Solution

  • You can select elements based on an attribute via [@attrib='value'] according to the official documentation. When testing this i exchanged your tag <XML Data> and </XML Data> with <Data> and </Data>. Example code:

    from xml.etree import ElementTree as ET
    
    data = ET.parse(r"/path/to/your/input.xml")
    root = data.getroot()
    for product in root.findall("./Record/Service/Product[@id='A']"):
        print(product.attrib["id"])
        print(product.text)
    

    Edit

    After reading your question again i noticed that you first want to check whether a product with id A exists within a Service, and only then store the IDs (uniquely & sorted), so i adapted the code:

    from xml.etree import ElementTree as ET
    
    data = ET.parse(r"/path/to/your/input.xml")
    root = data.getroot()
    product_ids = set()
    for service in root.findall("./Record/Service"):
        list_contains_a = False
    
        # iterate once to identify if list contains product with ID = 'A'
        for product in service.findall("./Product"):
            if product.attrib["id"] == "A":
                list_contains_a = True
    
        # if list contains product with ID = 'A', iterate second time and fetch IDs
        if list_contains_a:
            for product in service.findall("./Product"):
                if product.attrib["id"] == "A":
                    continue
    
                # add to set to prevent duplicates
                product_ids.add(product.attrib["id"])
    
    ret_list = ["ServiceID"] + list(sorted(product_ids))
    print(ret_list)