I am trying to remove some nested tags of xml represented as string using Python 3.8 and built in xml.dom.minidom. Result is surprising, parser removes only first or opened tag and leaves closed tag. Surely I am missing something, but I can't see what it is.
import xml.dom.minidom as xml
StringXML = "<root><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1></root>"
a = xml.parseString(StringXML)
num = 0
while (a.getElementsByTagName('test2').length > num):
if(a.getElementsByTagName('test2')[num]):
a.getElementsByTagName('test2')[num].parentNode.removeChild(a.getElementsByTagName('test2')[num])
a.getElementsByTagName('test2')[num].unlink()
num = num +1
print(a.toxml())
If you just want to remove all test2
elements, there is no need to increment a counter. Just iterate over the items returned by getElementsByTagName('test2')
.
import xml.dom.minidom as xml
StringXML = "<root><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1><test1><test2></test2></test1></root>"
a = xml.parseString(StringXML)
for test2 in a.getElementsByTagName('test2'):
test2.parentNode.removeChild(test2)
# Need to add empty text node to get <test1></test1> serialization
for test1 in a.getElementsByTagName('test1'):
test1.appendChild(a.createTextNode(''))
print(a.toprettyxml())
Output:
<?xml version="1.0" ?>
<root>
<test1></test1>
<test1></test1>
<test1></test1>
<test1></test1>
</root>