I have parsed XML file bu using Python SAX and successfully store the tag and values in dictionary. I used defaultdict to store the pairs for some nested tags with same tag names. Eg:
<Actions>
<Action sequenceNumber="1">
<ActionCode/>
<ActionDescription/>
</Action>
<Action sequenceNumber="2">
<ActionCode/>
<ActionDescription/>
</Action>
<Action sequenceNumber="3">
<ActionCode/>
<ActionDescription/>
</Action>
So the output would be like this, if they are empty:
defaultdict(, {u'ActionDescription': [' ', ' ', ' '], u'ActionCode': [' ', ' ', ' ']})
I need to insert the values into MySQL, so I put the column name and values in a string.
actionslist_insertstr = "INSERT INTO dbname.Actions ("
actionslist_valuesstr = "VALUES ('"
actionslist = handler.actions
for i, (k, v) in enumerate(actionslist.iteritems()):
actionslist_insertstr += k
actionslist_valuesstr += "/".join(v)
if i < len(actionslist)-1:
actionslist_insertstr += ","
actionslist_valuesstr += "','"
actionslist_insertstr += ")"
actionslist_valuesstr += "')"
INSERT INTO dbname.Actions (ActionDescription,ActionCode) VALUES (' / / ',' / / ')
But I don't want the values to be store as one string in the database. What can I do to split the values into rows, using the delimiter and store them according to their column name? I tried to split the values but then they wouldn't have relationship to their keys anymore? Would be greatly appreciate it if someone can give me an idea how to solve this problem. Thank you.
If I understand correctly, you have a data object called handler.actions
, and this is now a defaultdict, with items:
{
'ActionDescription': [' ', ' ', ' '],
'ActionCode': [' ', ' ', ' '],
}
because of the way you have parsed the XML.
This means that handler.actions['ActionDescription'][0]
is the description for sequenceNumber=1
and handler.actions['ActionCode'][0]
is the code for sequenceNumber=1
.
The rows in the XML file have turned into columns in the lists above.
If I'm right, you have two choices:
You could change the way you are parsing the XML. I'm sure there is a feature that will enable you to generate separate dictionaries for each <Action>...</Action>
item. If so, you could then process them in order.
You can use the Python zip
command to marry the two (or more) lists together.
Code:
# Fake it with hard-coded data
# actionslist = handler.actions
actionslist = {
'ActionDescription': [ 'd1', 'd2', ''],
'ActionCode': ['', 'c2', 'c3'],
}
tablename = 'dbname.Actions'
fieldnames = sorted(actionslist.keys())
insert_fields = 'INSERT INTO {table_name} ({field_list})'.format(
table_name=tablename,
field_list=', '.join(fieldnames)
)
insert_values = insert_fields + ' VALUES({})'
values = [actionslist[k] for k in fieldnames]
for tpl in zip(*values):
stmt = insert_values.format(', '.join(map(repr, tpl)))
print(stmt)
As output, I get:
INSERT INTO dbname.Actions (ActionCode, ActionDescription) VALUES('', 'd1')
INSERT INTO dbname.Actions (ActionCode, ActionDescription) VALUES('c2', 'd2')
INSERT INTO dbname.Actions (ActionCode, ActionDescription) VALUES('c3', '')