Pypy memory usage increases when nothing new is allocated

I'm not sure if this is a duplicate of the other PyPy memory questions, but here I'll provide a concrete example.

from __future__ import division

def mul_inv(a, m):
    """Modular multiplicative inverse, a^-1 mod m. Credit: rosettacode.org"""
    m0 = m
    x0, x1 = 0, 1
    if m == 1: return 1
    while a > 1:
        assert m != 0, "a and m must be coprime"
        q = a // m
        a, m = m, a%m
        x0, x1 = x1 - q * x0, x0
    if x1 < 0: x1 += m0
    return x1


M = 1000000009
L = 10**8

bin2 = [0] * L
bin2[0] = 1
for n in range(L-1):
    bin2[n+1] = (bin2[n] * (4*n + 2) * mul_inv(n+1, M)) % M
    if n % 10**5 == 0: print(n, bin2[n])

print(bin2[:20])

With python 3.6, the program uses 3-4 GB at most and runs to completion (Armin Rigo's list change doesn't change this significantly). With python 2.7.13 running PyPy 5.10.0, the program reaches 8 GB (how much RAM I have) quickly and freezes. Even with gc.collect() calls the program runs out of memory when n is about 3.5 * 10^7.

Where is this memory usage coming from? The only large memory usage should be initializing bin2 as a 10^8 int list. Nothing else should be increasing the memory usage, under the assumption that all the local variables in mul_inv are garbage collected.

Solution

Oops, it's a bad case of the optimization for lists of integers. The problem is that this starts as a list of ints:

bin2 = [0] * L

This is internally stored as an array of ints. It's usually much more compact, even though in this case it doesn't change anything---because on CPython it's a list containing L copies of the same object 0.

But the problem is that pretty soon, we store a long in the list. At this point, we need to turn the whole list into the generic kind that can store anything. But! The problem is that we see 100 million zeroes, and so we create 100 million 0 objects. This creates instantly 3 GB of memory pressure for nothing, in addition to 800MB for the list itself.

We can check that the problem doesn't occur if we initialize the list like this, so that it really contains 100 million times the same object:

bin2 = [0L] * L     # Python 2.x
bin2[0] = 1

That said, in your example you don't need the list to contain 100 million elements in the first place. You can initialize it as:

bin2 = [1]

and use bin2.append(). This lets the program start much more quickly and without any large memory usage near the beginning.

Note that PyPy3 still uses more memory than CPython3.