Search code examples
pythonxmlelementtree

get attribute of and iter only with elementTree


I have these xml:

<DEFINITION xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Fol.xsd">
    <FOLDER SERVER="CTMAFB" VERSION="918" SO="UNIX" FOLDER_NAME="FOLDER_ONE" MODIFIED="False" LAST_UPLOAD="20220518084048UTC" FOLDER_ORDER_METHOD="SYSTEM" REAL_FOLDER_ID="2" TYPE="1" USED_BY_CODE="0">
        <JOB ID="256" APP="APP" SUB_APP="SUBAPP" JOBNAME="JOBA" CREATED_BY="emuser" RUN_AS="root" CRITICAL="0" CREATION_DATE="20190916" CREATION_TIME="120730" PARENT_FOLDER="FOLDER_ONE">
            <SHOUT WHEN="DDD" TIME="1825"/>
        </JOB>
        <JOB ID="263" APP="APP" SUB_APP="SUBAPP" JOBNAME="JOBB" CREATION_TIME="174238" PARENT_FOLDER="FOLDER_ONE">
        </JOB>
    </FOLDER>
    <FOLDER SERVER="CTMAFB" VERSION="918" SO="UNIX" FOLDER_NAME="FOLDER_TWO" MODIFIED="False" LAST_UPLOAD="20220611092853UTC" REAL_FOLDER_ID="589" TYPE="1" USED_BY_CODE="0">
        <JOB ID="2" APP="APP" SUB_APP="SUB" JOBNAME="JOBC" CREATION_DATE="20220611" VPARENT_FOLDER="FOLDER_TWO" />
        <JOB ID="3" APP="APP" SUB_APP="SUB" JOBNAME="JOBD" CREATION_DATE="20220611" CREATION_TIME="102504" CHANGE_USERID="ESY9C4DB" CHANGE_DATE="20220611"  PARENT_FOLDER="FOLDER_TWO" />
    </FOLDER>
</DEFINITION>

How you could see there is two folder_name with two jobs inside each folder_name.

I'm tring to get it with the code:

for nodes in tree.iter('FOLDER'):   
    nameFolder = nodes.attrib.get('FOLDER_NAME')
    print('NameFolder is ...' + nameFolder)
    
    for nodes in tree.iter('JOB'):       
        name = nodes.attrib.get('JOBNAME')
        print('NAMEEEE .... ' + name)

But with that code I get:

NameFolder is ...FOLDER_ONE
NAMEEEE ....  JOBA
NAMEEEE ....  JOBB
NAMEEEE ....  JOBC
NAMEEEE ....  JOBD
NameFolder is ...FOLDER_TWO
NAMEEEE .... JOBA
NAMEEEE ....  JOBB
NAMEEEE ....  JOBC
NAMEEEE ....  JOBD

And I need

NameFolder is ...FOLDER_ONE
NAMEEEE ....  JOBA
NAMEEEE ....  JOBB
NameFolder is ...FOLDER_TWO
NAMEEEE .... JOBC
NAMEEEE ....  JOBD

Please any help? Thanks


Solution

  • Listing [Python.Docs]: xml.etree.ElementTree - The ElementTree XML API.

    The problem is that in your inner loop, you iterate over the whole tree, when you should in fact only iterate on the current one (nodes - and here you also have a name conflict).

    code00.py:

    #!/usr/bin/env python
    
    import sys
    from xml.etree import ElementTree as ET
    
    
    xml_blob = """
    <DEFINITION xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="Fol.xsd">
        <FOLDER SERVER="CTMAFB" VERSION="918" SO="UNIX" FOLDER_NAME="FOLDER_ONE" MODIFIED="False" LAST_UPLOAD="20220518084048UTC" FOLDER_ORDER_METHOD="SYSTEM" REAL_FOLDER_ID="2" TYPE="1" USED_BY_CODE="0">
            <JOB ID="256" APP="APP" SUB_APP="SUBAPP" JOBNAME="JOBA" CREATED_BY="emuser" RUN_AS="root" CRITICAL="0" CREATION_DATE="20190916" CREATION_TIME="120730" PARENT_FOLDER="FOLDER_ONE">
                <SHOUT WHEN="DDD" TIME="1825"/>
            </JOB>
            <JOB ID="263" APP="APP" SUB_APP="SUBAPP" JOBNAME="JOBB" CREATION_TIME="174238" PARENT_FOLDER="FOLDER_ONE">
            </JOB>
        </FOLDER>
        <FOLDER SERVER="CTMAFB" VERSION="918" SO="UNIX" FOLDER_NAME="FOLDER_TWO" MODIFIED="False" LAST_UPLOAD="20220611092853UTC" REAL_FOLDER_ID="589" TYPE="1" USED_BY_CODE="0">
            <JOB ID="2" APP="APP" SUB_APP="SUB" JOBNAME="JOBC" CREATION_DATE="20220611" VPARENT_FOLDER="FOLDER_TWO" />
            <JOB ID="3" APP="APP" SUB_APP="SUB" JOBNAME="JOBD" CREATION_DATE="20220611" CREATION_TIME="102504" CHANGE_USERID="ESY9C4DB" CHANGE_DATE="20220611"  PARENT_FOLDER="FOLDER_TWO" />
        </FOLDER>
    </DEFINITION>
    """
    
    
    def main(*argv):
        root = ET.fromstring(xml_blob)
        for folder_node in root.iter("FOLDER"):
            print(folder_node.attrib.get("FOLDER_NAME"))
            for job_node in folder_node.iter("JOB"):  # Iterate on folder_node, NOT root
                print("  ", job_node.attrib.get("JOBNAME"))
    
    
    if __name__ == "__main__":
        print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                       64 if sys.maxsize > 0x100000000 else 32, sys.platform))
        rc = main(*sys.argv[1:])
        print("\nDone.")
        sys.exit(rc)
    

    Output:

    [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q072586383]> "e:\Work\Dev\VEnvs\py_pc064_03.09_test0\Scripts\python.exe" ./code00.py
    Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)] 064bit on win32
    
    FOLDER_ONE
       JOBA
       JOBB
    FOLDER_TWO
       JOBC
       JOBD
    
    Done.
    

    Personally, I prefer XPath when traversing XML trees: