This question is regarding the package pyosmium
specifically. I was just wondering if the following functionality is possible, and if not how it could be implemented.
I want to stream/yield certain instance attributes instead of updating them in-memory.
Currently we can do the following:
class Handler(osmium.SimpleHandler):
def __init__(self):
osmium.SimpleHandler.__init__(self)
self.edge_and_nodes = []
def way(self, w):
self.edge_and_nodes.append({'edge_id': w.id,
'nodes': [w.nodes[i].ref for i in range(len(w.nodes))]})
h = Handler()
h.apply_file("test.osm.pbf")
print("Edges and their connected nodes: {}".format(h.edge_and_nodes))
However, when dealing with large regions this is not scalable.
I would like a way of yielding a dictionary object that includes WayIds and related NodeIds (as well as tags, etc) for every WayObject. Is this possible?
I am looking for something like this:
class StreamHandler(osmium.SimpleHandler):
def __init__(self):
osmium.SimpleHandler.__init__(self)
self.edge_and_nodes = []
def way(self, w):
yield {'edge_id': w.id,
'nodes': [w.nodes[i].ref for i in range(len(w.nodes))]}
h = StreamHandler()
h.apply_file("test.osm.pbf")
for row in h.way(w):
print(row)
But I am not sure how to pass the w
parameter (WayObject) since that seems to be dealt with internally using the apply_file()
method (and I can't seem to find the source code for that method).
Thanks!
Edit: the source code can be found here
I found a work-around. Using pydriosm I was able to add some custom generators that parse and stream *.osm.pbf files completely in Python. This is ideal for a Spark or Dataflow job that streams the data into a database.