Is there a limit to how large an array can be inside of a function in cython?

This code compiles and runs just fine:

cdef enum:
    SIZE = 1000000
cdef int ar[SIZE]
cdef int i

for i in range(SIZE):
    ar[i] = i
print(ar[5])

but this code:

cdef enum:
    SIZE = 1000000
    
def test():
    cdef int ar[SIZE]
    cdef int i

    for i in range(SIZE):
        ar[i] = i
    print(ar[5])
    
test()

crashes the python kernel (I'm running this with jupyter magic).

Is there some limit to how large arrays can in inside of functions? If there is, is there a way to remove that limit?

Solution

In the first case, the array is a global-variable statically-defined plain C array which are allocated on the heap. In the second case, the local-variable array is allocated on the stack. The thing is the stack has a fixed maximum size (generally something like 2 MiB but this can change a lot on different platforms). If you try to write something allocated beyond the limit of the stack you get a crash or a nice stack overflow error if you are lucky. Note that function calls and local variables temporary take some space in the stack. While the size of the stack can be resized, the best solution is to use dynamic allocation for relatively big arrays or the ones with a variable size. Hopefully, this is possible in Cython using malloc and free (see the documentation) but you should be careful not to forget the free (nor to free the array twice). An alternative solution is to create a Numpy array and then work with memory views (this is more expensive but prevent any possible leak).