I have an XML file of the structure as follows
<article>
<body>
text1
<collectionlink>
text2
</collectionlink>
text3
</body>
</article>
I used iterparser for parsing. But its not printing the data correctly. I am adding code here.
from xml.etree.ElementTree import iterparse,dump
def main():
fp=open("sam.xml",'r')
tree_dict = create_dict_tree_elements(fp)
def create_dict_tree_elements(fp):
depth=0
for event,node in iterparse(fp, ['start', 'end', 'start-ns', 'end-ns']):
if event=='start-ns' or event=='end-ns':
continue
if (event == 'start' and depth == 0):
print node.text
depth += 1
continue
if (event == 'start' and depth >0 ):
print node.text
depth+=1
if(event =='end' ):
depth-=1
if __name__ == '__main__':
main()
My expected output:
text1
text2
text3
Output am getting
text1
text2
In terms of ElementTree node.text is the text between the opening tag and the next tag. The text between the closing tag and the next tag can be found in node.tail.