This question is closely related to: 53641278. In the comments of that question there was a suggestion by @ead to use the approach from comment by @DavidW of question 53619438.
In my problem, I am provided by the user with a custom python function (e.g. times_two()
) which I don't know beforehand and I only know the data type of parameter and output (int
and int
). My job is to calculate this function for a big range of parameters (e.g. 1000
). To speed up, I'd like to use prange
which requires C
data types.
I want to use the approach from the mentioned comment and I try this:
script.pyx
import ctypes
ctypedef int (* FuncPtr) (int tt)
ftype = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_int)
cdef loop_over(FuncPtr cy_f_ptr):
cdef:
int i, s = 0
for i in range(1000):
s += cy_f_ptr(i)
return s
cpdef get_sum(func):
cdef FuncPtr cy_f_ptr = (<FuncPtr *> <size_t> ctypes.addressof(ftype(func)))[0]
s = loop_over(cy_f_ptr)
return s
setup.py
from distutils.core import setup
from Cython.Build import cythonize
setup(name="script", ext_modules=cythonize("script.pyx", compiler_directives={"language_level": "3"}))
terminal
python setup.py build_ext -i
main.py
from script import get_sum
def times_two(x):
return x*2
print(get_sum(times_two))
When running, I get the following error:
Process finished with exit code -1073741819 (0xC0000005)
My expectation was that the code will print the value 999000
.
I suspect your code is crashing because you've thrown away a temporary:
You've changed
f = ftype(func)
cdef FuncPtr cy_f_ptr = (<FuncPtr *> <size_t> ctypes.addressof(f))[0]
to
cdef FuncPtr cy_f_ptr = (<FuncPtr *> <size_t> ctypes.addressof(ftype(func)))[0]
The ftype
object must be kept alive for as long as the function pointer is. In my version it is, while in your version the ftype
object is destroyed pretty much instantly, meaning the function pointer is invalid as soon as you have it.
However, there's a fundamental problem here: to call a Python function you MUST hold the GIL, which means you can't use prange
to parallelize it. Your function pointer scheme disguises this a little but it's still true.
What would sort of work is if you introduce a with gil:
round the call of cy_f_ptr
. This would only be worthwhile if the function releases the GIL internally (i.e. ends up calling into optimized C code itself). It's possible that ctypes
gets the GIL itself - I'm not sure - but this still wouldn't change the calculation that you can't parallelize Python code like this.