I have a script that goes through all the XML files in directory and then parses those XML files to get the data in element IS
tag ICP
. However, there are several thousands of those XML files and some of them may not have tag ICP
in IS
. Is there a way to do it via minidom?
Example of XML I am parsing that has element IS
and tag ICP
:
<is ico="0000000000" pcz="1" icp="12345678" icz="12345678" oddel="99">
Example of XML I am parsing that has element IS
but no tag ICP
:
<is ico="000000000">
Here my script obviously fails as there is no ICP
. How to check presence of the ICP
tag?
My script:
import os
from xml.dom import minidom
#for testing purposes
directory = os.getcwd()
print("Zdrojový adresář je: " + directory)
print("Procházím aktuální adresář, hledám XML soubory...")
print("Procházím XML soubory, hledám IČP provádějícího...")
with open ('ICP_all.txt', 'w') as SeznamICP_all:
for root, dirs, files in os.walk(directory):
for file in files:
if (file.endswith('.xml')):
xmldoc = minidom.parse(os.path.join(root, file))
itemlist = xmldoc.getElementsByTagName('is')
SeznamICP_all.write(itemlist[0].attributes['icp'].value + '\n')
print("Vytvářím list unikátních IČP...")
with open ('ICP_distinct.txt','w') as distinct:
UnikatniICP = []
with open ('ICP_all.txt','r') as SeznamICP_all:
distinct.writelines(set(SeznamICP_all))
input('Pro ukončení stiskni libovolnou klávesu...')
I googled a lot, yet I cannot get a simple answer on how to check if a tag is present in XML using minidom.
Could you please give me some advise?
You can use hasAttribute(attributeName)
method :
....
itemlist = xmldoc.getElementsByTagName('is')
if itemlist[0].hasAttribute("icp"):
SeznamICP_all.write(itemlist[0].attributes['icp'].value + '\n')