I am using iterparse() from python lxml to parse through a large XML file and get relevant data. This works perfectly fine, except for the first time an event occurs. The data for the first node is not captured. The same thing happens for when I want to get the tag "way" (not in this code snippet). Why does the first event element not get captured?
tree = etree.iterparse(state_file_xml, events=("start", "end"),tag=('node'))
context = iter(tree)
event, root = context.next()
nodes = {}
for event, elem in context:
if ((event == 'end') and (elem.tag == 'node')) :
id = elem.get("id")
lat = float(elem.get("lat"))
lon = float(elem.get("lon"))
nodes[id] = [lat,lon]
my xml file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="Overpass API 3079d8ea">
<note>The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.</note>
<meta osm_base="2018-11-09T21:23:02Z"/>
<way id="46916568">
<nd ref="286427634"/>
<nd ref="3371562694"/>
<nd ref="3371562693"/>
<nd ref="1044837456"/>
<nd ref="1299487829"/>
<nd ref="1299487860"/>
<nd ref="284132018"/>
<tag k="highway" v="secondary"/>
<tag k="lit" v="yes"/>
<tag k="maxspeed" v="50"/>
<tag k="name" v="Zürcherstrasse"/>
<tag k="surface" v="asphalt"/>
<node id="30228243" lat="47.4030908" lon="8.4049015"/>
<node id="283533527" lat="47.4016971" lon="8.4036696"/>
<node id="284132018" lat="47.4034413" lon="8.4042634"/>
<node id="286427571" lat="47.4037481" lon="8.4058661"/>
<node id="286427634" lat="47.4043045" lon="8.4032429"/>
<node id="318217124" lat="47.4044289" lon="8.4054211"/>
<node id="428076175" lat="47.4027948" lon="8.4045078"/>
<node id="460527594" lat="47.4027445" lon="8.4055605"/>
<node id="460527973" lat="47.4029993" lon="8.4040697"/>
<node id="984783907" lat="47.4027808" lon="8.4054934"/>
consumes the first node:
In [14]: tree = etree.iterparse(state_file_xml, events=("start", "end"),tag=('node'))
In [15]: context = iter(tree)
In [16]: event, root = next(context)
In [17]: root.attrib
Out[17]: {'id': '30228243', 'lon': '8.4049015', 'lat': '47.4030908'}
(I changed context.next()
to next(context)
to allow the code to work with both Python2 and Python3.)
By the way, iterparse
returns an iterator, so context = iter(tree)
is unnecessary.
And since you only need to processes each node
once, events=("end",)
import lxml.etree as ET
context = ET.iterparse(state_file_xml, events=("end",), tag=('node'))
nodes = {}
for event, elem in context:
id = elem.get("id")
lat = float(elem.get("lat"))
lon = float(elem.get("lon"))
nodes[id] = [lat,lon]