Search code examples
pythonxmlminidom

Python - How to Count XML attribute elements with Restricted values


I am trying to count only the file tags with code attribute greater than or equal to 10. Below is my code:-

from xml.dom.minidom import parse, parseString
import xml.dom.minidom

DOMTree = xml.dom.minidom.parse("param.xml")
group = DOMTree.documentElement

code_line_10=[0,1,2,3,4,5,6,7,8,9]

num_source_file = 0
for file in group.getElementsByTagName("file"):
    if file.hasAttribute("code"):
         attribute_value = file.getAttribute("code")
         if attribute_value not in code_line:
             num_source_file += 1
print(num_source_file)

This is an extract of the XML file I'm using:-

<?xml version="1.0"?><results>
<files>
<file name="cadasta-platform/cadasta/templates/allauth/account/password_set.html" blank="5" comment="0" code="11"  language="HTML" />
  <file name="cadasta-platform/cadasta/templates/allauth/openid/login.html" blank="7" comment="0" code="11"  language="HTML" />
  <file name="cadasta-platform/cadasta/resources/tests/test_views_mixins.py" blank="4" comment="0" code="11"  language="Python" />
  <file name="cadasta-platform/cadasta/core/tests/test_translations.py" blank="2" comment="0" code="11"  language="Python" />
  <file name="cadasta-platform/cadasta/organization/urls/default/users.py" blank="2" comment="0" code="11"  language="Python" />
  <file name="cadasta-platform/cadasta/core/node_modules/bootstrap-sass/assets/stylesheets/bootstrap/mixins/_alerts.scss" blank="2" comment="1" code="11"  language="SASS" />
  <file name="cadasta-platform/cadasta/resources/tests/utils.py" blank="2" comment="0" code="11"  language="Python" />
  <file name="cadasta-platform/cadasta/core/static/js/rel_tenure.js" blank="2" comment="1" code="11"  language="Javascript" />
  <file name="cadasta-platform/cadasta/templates/party/relationship_resources_new.html" blank="3" comment="0" code="11"  language="HTML" />
  <file name="cadasta-platform/functional_tests/pages/AccountInactive.py" blank="6" comment="1" code="11"  language="Python" />
  <file name="cadasta-platform/cadasta/core/management/commands/loadsite.py" blank="3" comment="0" code="10"  language="Python" />
  <file name="cadasta-platform/cadasta/core/node_modules/bootstrap-sass/assets/stylesheets/bootstrap/mixins/_hide-text.scss" blank="2" comment="9" code="10"  language="SASS" />
  <file name="cadasta-platform/functional_tests/projects/test_project.py" blank="13" comment="109" code="0"  language="Python" />

Upon executing the above code, it will count all the file tags in the xml document including the ones I want to exclude. What I'm I not doing correctly?


Solution

  • The file.getAttribute("code") returns str object and '1' in [1] is False. Now there are multiple way how to solve your problem.

    First the bad solutions:

    • Alternate code_line_10=[0,1,..,9] to code_line_10=['0','1',..,'9'].
    • Change if attribute_value not in code_line: to if int(attribute_value) not in code_line: (beware if code attribute is not convertible into int it raises exception)

    In both solutions the algorithm still has to go through all the items in the list and compare items one by one and that takes some time. The faster solution is just compare the value with operator <=. So you can alternate the if into if int(attribute_value) >= 10: (again if code attribute is not convertible into int it raises exception)