I have a large XML file, which is roughly structured (in this order) :
<document>
<interesting_part>
...
</interesting_part>
<foo>
...
60000 lines
...
</foo>
</document>
My program is :
from xml.etree import ElementTree as et
f=open(path_f)
tree=et.parse(f)
f.close()
# retreive infos from tree...
Only the first few block interests me in the file, but performance is low because et.parse() loads the whole file.
How to load the file only till < / interesting_part > ?
I thought of something like :
class My_Parser(et.XMLParser):
????
my_parser = My_Parser()
tree=et.parse(f, my_parser)
Thanking you by advance, Eric.
Use the iterparse()
function instead, and simply stop iterating when you have what you want:
for event, element in et.iterparse(f):
if element.tag == 'interesting_part':
# `element` is the complete <interesting_part> element, with children
# process it
break # ends parsing.