I am currently trying to use python to create 3 files from one xml file.
There are three types of data in the xml file, estates, symbol names and tick types.
I want 3 text files, listing the 3 different things.
This is currently my code, and it lists the estates absolutely fine:
from xml.dom import minidom
#Define the xmldoc object
xmldoc = minidom.parse('C:\\Temp\\Symbols.xml')
#Define EstateList by getting Elements by tag name
EstateList = xmldoc.getElementsByTagName('Estate')
#Print Estate List
print "There are currently %d data estates" % len(EstateList)
#print EstateList[0].attributes['EstateName'].value
for s in EstateList:
print s.attributes['EstateName'].value
#Save Estate List to file
with open('dataestates.txt', 'w') as f:
f.write("There are currently %d data estates \n" % len(EstateList))
for s in EstateList:
f.write(s.attributes['EstateName'].value + "\n")
However, when I start looking at the other ones, symbol names and tick types I can't get anything to work, I can't get close to it listing tick types, I've tried attributes, tags, all sorts.
Here is an example of the xml code
<Estates>
<Estate EstateName="BBG.DL.BOND.RAW._LIVE">
<Ticktype>BBG_BGN</Ticktype>
<Ticktype>BBG_BVAL</Ticktype>
<Ticktype>BBG_CBBT</Ticktype>
<Ticktype>BBG_IXEP</Ticktype>
<Ticktype>BBG_IXSP</Ticktype>
<Ticktype>BBG_TRAC</Ticktype>
<Ticktype>BBG</Ticktype>
</Estate>
<Estate EstateName="BBG.DL.CCY.RAW._LIVE">
<Ticktype>BBG</Ticktype>
</Estate>
</Estates>
<Symbols>
<Symbol SymbolName="AT0000386073 Corp" Estate="BBG.DL.BOND.RAW._LIVE" TickType="BBG_BGN" />
<Symbol SymbolName="AT0000386073 Corp" Estate="BBG.DL.BOND.RAW._LIVE" TickType="BBG_BVAL" />
</Symbols>
The interior text of a <Ticktype>
element is stored in a child node. To access the text, you must find that child. Node.firstChild
should do it for you. Once you have found the child node, you can get the text through the Text.data
attribute.
Thus, given a <Ticktype>
element, you can find the text as: .firstChild.data
.
ticklist = xmldoc.getElementsByTagName('Ticktype')
print "There are currently %d tick types" % len(ticklist)
for s in ticklist:
print s.firstChild.data
The tick types appear to have duplicate values. You can reduce them to a unique list by using a set
:
tickset = set(s.firstChild.data for s in ticklist)
print "There are %d unique tick types" % len(tickset)
for s in tickset:
print s
Symbols are stored nearly identically to how estates are. Thus, they are extracted similarly to how estates are extracted:
symlist = xmldoc.getElementsByTagName('Symbol')
print "There are currently %d symbols" % len(symlist)
for s in symlist:
print s.attributes['SymbolName'].value