I have some XML:
<sentence id="1086415:2">
<text>$6 and there is much tasty food, all of it fresh and continually refilled.</text>
<Opinions>
<Opinion to="31" from="27" polarity="positive" category="FOOD#STYLE_OPTIONS" target="food"/>
<Opinion to="31" from="27" polarity="positive" category="FOOD#QUALITY" target="food"/>
<Opinion to="31" from="27" polarity="positive" category="FOOD#PRICES" target="food"/>
</Opinions>
</sentence>
<sentence id="1086415:3">
<text>I am not a vegetarian but, almost all the dishes were great.</text>
<Opinions>
<Opinion to="48" from="42" polarity="positive" category="FOOD#QUALITY" target="dishes"/>
</Opinions>
I am attempting to extract everything within the Opinions tag to couple it with the text in a tuple. I am wondering how I can do this with minidom? Currently opinion returns '\n '.
from xml.dom import minidom
xmldoc = minidom.parse("ABSA16_Restaurants_Train_SB1_v2.xml")
sentences = xmldoc.getElementsByTagName("sentence")
for sentence in sentences:
text = sentence.getElementsByTagName("text")[0].firstChild.data
opinion = sentence.getElementsByTagName("Opinions")[0].firstChild.data
Thank you.
Are you sure your need minidom
?
From the docs:
Users who are not already proficient with the DOM should consider using the xml.etree.ElementTree module for their XML processing instead.
Without strong reasons don't waste your time and use standard python xml.etree.ElementTree
, it has enough examples in it's manual to resolve your task. Feel free to ask in comments if get some troubles with it.
More than that, if you need to work with XMLs often, I advice third-party lxml
, it is more powerful tool with some batteries included.