Search code examples
pythonxmlparsingxalan

Splitting XML file into multiple at given tags


I want to split a XML file into multiple files. My workstation is very limited to Eclipse Mars with Xalan 2.7.1.

I can also use Python, but never used it before.

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <row>
        <NAME>Doe</NAME>
        <FIRSTNAME>Jon</FIRSTNAME>
        <GENDER>M</GENDER>
    </row>
    <row>
        <NAME>Mustermann</NAME>
        <FIRSTNAME>Max</FIRSTNAME>
        <GENDER>M</GENDER>
    </row>
</root>

How can I transform them to look like this

<?xml version="1.0" encoding="UTF-8"?>
    <root>
        <row>
            <NAME>Doe</NAME>
            <FIRSTNAME>Jon</FIRSTNAME>
            <GENDER>M</GENDER>
        </row>
    </root>

I need every "row"-data in a single file with header. The data above is just an example. Most of the "row"-data has 16 attributes, but it varies from time to time.


Solution

  • Use Python ElementTree.

    Create a file e.g. xmlsplitter.py. Add the code below (where file.xml is your xml file and assuming every row has a unique NAME element.).

    import xml.etree.ElementTree as ET
    context = ET.iterparse('file.xml', events=('end', ))
    for event, elem in context:
        if elem.tag == 'row':
            title = elem.find('NAME').text
            filename = format(title + ".xml")
            with open(filename, 'wb') as f:
                f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
                f.write(ET.tostring(elem))
    

    Run it with

    python xmlsplitter.py
    

    Or if the names are not unique:

    import xml.etree.ElementTree as ET
    context = ET.iterparse('file.xml', events=('end', ))
    index = 0
    for event, elem in context:
        if elem.tag == 'row':
            index += 1
            filename = format(str(index) + ".xml")
            with open(filename, 'wb') as f:
                f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")
                f.write(ET.tostring(elem))