Search code examples
pythonxmliterparse

xml parsing not working correctly


I have an XML file of the structure as follows

<article>
<body>
text1
<collectionlink>
text2
</collectionlink>
text3
</body>
</article>

I used iterparser for parsing. But its not printing the data correctly. I am adding code here.

from xml.etree.ElementTree import iterparse,dump

def main():
    fp=open("sam.xml",'r')
    tree_dict = create_dict_tree_elements(fp)

def create_dict_tree_elements(fp):
    depth=0
    for event,node in iterparse(fp, ['start', 'end', 'start-ns', 'end-ns']):
        if event=='start-ns' or event=='end-ns':
            continue
        if (event == 'start' and depth == 0):
            print node.text
            depth += 1
            continue        

        if (event == 'start' and depth >0 ):
            print node.text
            depth+=1

        if(event =='end' ):
            depth-=1



if __name__ == '__main__':
    main()

My expected output:

text1
text2
text3

Output am getting

text1
text2

Solution

  • In terms of ElementTree node.text is the text between the opening tag and the next tag. The text between the closing tag and the next tag can be found in node.tail.