I'm trying to compute some columns of a numpy
array, operating on python objects (numpy
array) in a for loop using a cdef
function.
I would like to do it in parallel. But not sure how to.
Here is a toy example, one def
function calls a cdef
function in a for loop using prange
, which is not allowed because np.ndarray
is a python object. In my real problem, one matrix and one vector are the arguments of the cdef
function, and some numpy
matrix operations are performed, like np.linalg.pinv()
(which I guess is actually the bottleneck).
%%cython
import numpy as np
cimport numpy as np
from cython.parallel import prange
from c_functions import estimate_coef_linear_regression
DTYPE = np.float
ctypedef np.float_t DTYPE_t
def transpose_example(np.ndarray[DTYPE_t, ndim=2] data):
"""
Transposes a matrix. It does each row independently and parallel
"""
cdef Py_ssize_t n = data.shape[0]
cdef Py_ssize_t t = data.shape[1]
cdef np.ndarray[DTYPE_t, ndim = 2] results = np.zeros((t, n))
cdef Py_ssize_t i
for i in prange(n, nogil=True):
results[i, :] = transpose_vector(data[:, i])
return results
cdef transpose_vector(np.ndarray[DTYPE_t, ndim=1] vector):
"""
transposes a np vector
"""
return vector.transpose()
a = np.random.rand(100, 20)
transpose_example(a)
outputs
Converting to Python object not allowed without gil
What would be the best way to do this in parallel?
You can pass typed memoryview slices (cdef transpose_vector(DTYPE_t[:] vector)
) around without the GIL - it's one of the key advantages of the newer typed memoryview syntax over np.ndarray
.
However,
np.asarray(vector)
). This requires the GIL.transpose
) is going to require the GIL. This can be done inside a with gil:
block, but when that block is almost your entire loop that becomes pretty pointless.transpose_vector
, and so it'll default to object
, which requires the GIL. You could specify a Cython return type, but I suspect even returning a memoryview slice may require some reference counting somewhere.In summary: memoryview slices, but bear in mind you're quite limited in what you can do without the GIL. Your current example just isn't parallelizable (but this may be mostly because it's a toy example).