Search code examples
pythonxmlsqliteminidom

xml missing element in python


System uses dom parser in python 2.7.2. The goal is to extract the .db file and use it on sql server.I currently have no problem with sqlite3 library. I have read the similar questions/answers about how to handle a missing element while parsing xml files.But still I couldn't figure out the solution. xml has 15000+ elements. here is the basic code from xml:

<topo>
   <vlancard>
      <id>4545</id>
      <nodeValue>21</nodeValue>
      <vlanName>voice</vlanName>
   </vlancard>
   <vlancard>
      <id>1234</id>
      <nodeValue>42</nodeValue>
      <vlanName>camera</vlanName>
   </vlancard>
   <vlancard>
      <id>9876</id>
      <nodeValue>84</nodeValue>
   </vlancard>
</topo>

Like the 3rd element, several elements do not have the node. That causes inconsistency on element numbers. i.e.

from xml.dom import minidom
xmldoc = minidom.parse('c:\vlan.xml')
vlId = xmldoc.getElementsByTagName('id')
vlValue = xmldoc.getElementsByTagName('nodeValue')
vlName = xmldoc.getElementsByTagName('vlanName')

after running the module:

IndexError: list index out of range
>>> len(id)
16163
>>> len(vlanName)
16155

Because of this problem , problem occurs for ordering the elements. while printing the table , parser passes the missing elements and element orders are mixed up. I use a simple while loop to insert the values into the table.

x=0
while x < (len(vlId)):
    c.execute('''insert into vlan ('id','nodeValue','vlanName') values ('%s','%s','%s') ''' %(id[x].firstChild.nodeValue, nodeValue[x].firstChild.nodeValue, vlanName[x].firstChild.nodeValue))
    x= x+1

How else can I do this? Any help will be appreciated.

Yusuf


Solution

  • Instead of parsing the entire xml and then inserting, parse each vlancard the retrieve it's id/value/name and then insert them into the DB.