Search code examples
cython

How to use a prange with a python function in cython?


This question is closely related to: 53641278. In the comments of that question there was a suggestion by @ead to use the approach from comment by @DavidW of question 53619438.

In my problem, I am provided by the user with a custom python function (e.g. times_two()) which I don't know beforehand and I only know the data type of parameter and output (intand int). My job is to calculate this function for a big range of parameters (e.g. 1000). To speed up, I'd like to use prange which requires C data types.

I want to use the approach from the mentioned comment and I try this:

script.pyx

import ctypes

ctypedef int (* FuncPtr) (int tt)
ftype = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_int)

cdef loop_over(FuncPtr cy_f_ptr):
    cdef:
        int i, s = 0

    for i in range(1000):
        s += cy_f_ptr(i)

    return s

cpdef get_sum(func):
    cdef FuncPtr cy_f_ptr = (<FuncPtr *> <size_t> ctypes.addressof(ftype(func)))[0]
    s = loop_over(cy_f_ptr)
    return s

setup.py

from distutils.core import setup
from Cython.Build import cythonize

setup(name="script", ext_modules=cythonize("script.pyx", compiler_directives={"language_level": "3"}))

terminal

python setup.py build_ext -i

main.py

from script import get_sum


def times_two(x):
    return x*2


print(get_sum(times_two))

When running, I get the following error:

Process finished with exit code -1073741819 (0xC0000005)

My expectation was that the code will print the value 999000.


Solution

  • I suspect your code is crashing because you've thrown away a temporary:

    You've changed

    f = ftype(func)
    cdef FuncPtr cy_f_ptr = (<FuncPtr *> <size_t> ctypes.addressof(f))[0]
    

    to

    cdef FuncPtr cy_f_ptr = (<FuncPtr *> <size_t> ctypes.addressof(ftype(func)))[0]
    

    The ftype object must be kept alive for as long as the function pointer is. In my version it is, while in your version the ftype object is destroyed pretty much instantly, meaning the function pointer is invalid as soon as you have it.


    However, there's a fundamental problem here: to call a Python function you MUST hold the GIL, which means you can't use prange to parallelize it. Your function pointer scheme disguises this a little but it's still true.

    What would sort of work is if you introduce a with gil: round the call of cy_f_ptr. This would only be worthwhile if the function releases the GIL internally (i.e. ends up calling into optimized C code itself). It's possible that ctypes gets the GIL itself - I'm not sure - but this still wouldn't change the calculation that you can't parallelize Python code like this.