I've been messing with streams in python and been trying to generate the Hamming numbers (all the numbers with prime factors of 2, 3, or 5 only). The standard way for doing so, described by Dijkstra, is to observe that:
My implementation is this:
def hamming():
yield 1
yield from merge(scale_stream(hamming(), 2), scale_stream(hamming(), 3))
def merge(s1, s2):
x1, x2 = next(s1), next(s2)
while True:
if x1 < x2:
yield x1
x1 = next(s1)
elif x1 > x2:
yield x2
x2 = next(s2)
else:
yield x1
x1, x2 = next(s1), next(s2)
def scale_stream(stream, scalar):
for e in stream:
yield e * scalar
def stream_index(stream, n):
for i, e in enumerate(stream):
if i+1 == n:
return e
print(stream_index(hamming(), 300))
This does correctly produce the stream of Hamming numbers, however for whatever reason it takes more and more time the longer it generates, even though in theory the time complexity should be O(N).
I have played around with other streams before but my intuition for them is pretty weak so I have no idea what is going on here. I think the issue is in the recursive way I defined hamming(); I don't know if it is an issue that every call to hamming might spawn a new version of the process that has to run in parallel thereby slowing it down.
Honestly though like I said I have a very poor idea of what actually happens when I run it and debugging has gotten me nowhere, so if someone with more experience can enlighten me I would really appreciate it.
The further you get out into your stream, the more duplicates you're going to have to be merging. The number 2**4 * 3 **4 = 1296 is going to appear 70 times in your multiple stream (8 choose 4), and your program is going to be spending more time merging duplicates than it is outputting new items.
The further you go out, the more duplication you'r going to be dealing with. There is no reason to expect your program to be linear.