Search code examples
pythonxmllxmllxml.objectify

How can I read 3 kinds of XML files using only one function?


Suppose, I have the following 3 kinds of XML files:

file1.xml

<memberdef>
    <param>
        <type>abc12300xyz__param -> type</type>
    </param>
</memberdef>

file2.xml

<memberdef>
    <param>
        <type>abc12300xyz__param -> type</type>
        <declname>abc12300xyz__param -> declname</declname>
    </param>
</memberdef>

file3.xml

<memberdef>
    <param>
        <type>
            <ref refid="abc12300xyz__refid" kindref="abc12300xyz__kindref">abc12300xyz -> ref</ref>
        </type>
        <declname>abc12300xyz__param -> declname</declname>
    </param>
</memberdef>

Suppose, I want to read these three files using LXML.

How do I know/test which file is loaded?

For instance, when either file1.xml or file2.xml are loaded, the following source code fails:

if memberdef.param.type.ref != None:
    ... ... ...
    ... ... ...

What tactic should I use in this case?


Solution

  • You can use this simple XPath check for element existance (replace fileX.xml). The XPath expressions do only return a non-empty result if the elements are present in the given XML file. In the below example, the test goes down from very specific to more general:

    from lxml import etree
    
    print("Checking variants...")
    root = etree.parse("fileX.xml")
    if root.xpath('/memberdef[param[type/ref and declname]]'):
        print("Third variant.")
    elif root.xpath('/memberdef[param[type and declname]]'):
        print("Second variant.")
    elif root.xpath('/memberdef/param[type]'):
        print("First variant.")
    else:
        print("None of the given variants.")
    

    So

    • the first IF checks if the memberdef element has a param child that has a type/ref child and a declname child.
    • the second IF only checks if the memberdef element has a param child that has a type child and a declname child.
    • the third IF checks if the memberdef element has a param child that has a type child.

    And so on, you should get the gist.