Search code examples
pythonpython-3.xxmltagsminidom

Can't properly remove nested xml tags with Python using xml minidom


I am trying to remove some nested tags of xml represented as string using Python 3.8 and built in xml.dom.minidom. Result is surprising, parser removes only first or opened tag and leaves closed tag. Surely I am missing something, but I can't see what it is.

import xml.dom.minidom as xml

StringXML = "<root><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1></root>"

a = xml.parseString(StringXML)
num = 0

while (a.getElementsByTagName('test2').length > num):
  if(a.getElementsByTagName('test2')[num]):

    a.getElementsByTagName('test2')[num].parentNode.removeChild(a.getElementsByTagName('test2')[num])
    a.getElementsByTagName('test2')[num].unlink()
  num = num +1

print(a.toxml())

Solution

  • If you just want to remove all test2 elements, there is no need to increment a counter. Just iterate over the items returned by getElementsByTagName('test2').

    import xml.dom.minidom as xml
    
    StringXML = "<root><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1></root>"
    
    a = xml.parseString(StringXML)
    
    for test2 in a.getElementsByTagName('test2'):
        test2.parentNode.removeChild(test2)
    
    # Need to add empty text node to get <test1></test1> serialization
    for test1 in a.getElementsByTagName('test1'):
        test1.appendChild(a.createTextNode(''))
    
    print(a.toprettyxml())
    

    Output:

    <?xml version="1.0" ?>
    <root>
        <test1></test1>
        <test1></test1>
        <test1></test1>
        <test1></test1>
    </root>