Search code examples
pythonmergeheappriority-queueheapq

How does heapq.merge() work with infinite generators?


I want to understand how heapq.merge() works with infinite generators. Consider this example:

>>> from heapq import merge
>>> from itertools import count
>>> m = merge(count(0, 2), count(1, 2))
>>> for _ in range(10):
...     print(next(m))
...
0
1
2
3
4
5
6
7
8
9 

The docs state that it does not pull the data into memory all at once. But how does it consume each of the infinite generators?


Solution

  • A very simple implementation of such a function could look like the following. Note, though, that for the sake of simplicity this does not handle any special (and not-so-special) cases like empty or exhausted iterables.

    def merge(*iterables):
        heap = [(next(it), i) for i, it in enumerate(iterables)]
        heapq.heapify(heap)
        while heap:
            val, i = heapq.heappop(heap)
            yield val
            heapq.heappush(heap, (next(iterables[i]), i))
    

    It works like this:

    • get the first element from each sorted iterable, together with that iterable's index in the list
    • yield the next smallest element from that heap
    • add the next element from the iterable with the same index as the one just yielded to the heap

    The actual implementation is a bit more involved, but seems to work roughly along the same lines. You can get the location of your local source with heapq.__file__, which on my system is /usr/lib/python3.6/heapq.py, and check yourself.