I am fairly new to cython and I am wondering why the following takes very long:
cpdef test(a):
cdef np.ndarray[dtype=int] b
for i in range(10):
b=a
a=np.array([1,2,3],dtype=int)
t = timeit.Timer(functools.partial(test.test, a))
print(t.timeit(1000000))
-> 0.5446977 Seconds
If i comment out the cdef declaration this is done in no-time. If i declare "a" as np.ndarray in the function header nothing changes. Also, id(a) == id(b) so no new objects are created.
Similar behaviour can be observed when calling a function that takes many ndarray as args, e.g.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
Can anybody help me? What am i missing here?
Edit: I noticed the following:
This is slow:
cpdef foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is faster:
def foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is the fastest
cpdef foo( a,b,c ) :
return
The function foo() is called very frequently (many million times) in my project from many different locations and does some calculus with the three numpy arrays (however, it doesnt change their content).
I basically need the speed of knowing the data-type inside of the arrays while also having a very low function-call overead. What would be the most adequate solution for this?
b = a
generates a bunch of type checking that needs to identify whether the type of a
is actually an ndarray
and makes sure it exports the buffer protocol with an appropriate element type. In exchange for this one-off cost you get fast indexing of single elements.
If you're not doing indexing of single elements then typing as np.ndarray
is literally pointless and you're pessimizing your code. If you are doing this indexing then you can get significant optimizations.
If i comment out the cdef declaration this is done in no-time.
This is often a sign that the C compiler has realized the entire function does nothing and optimized it out completely. And therefore your measurement may be meaningless.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
just specifying the type as np.ndarray
without specifying the element dtype usually gains you very little, and is probably not worthwhile.
If you have a function that you're calling millions of times then it is likely that the input arrays come from somewhere, and can be pre-typed, probably with less frequency. For example they might come by taking slices from a larger array?
The newer memoryview syntax (int[:]
) is quick to slice, so for example if you already have a 2D memoryview (int[:,:] x
) it's very quick to generate a 1D memoryview from it with (e.g. x[:,0]
), and it's quick to pass existing memoryviews into a cdef
function with memoryview arguments. (Note that (a) I'm just unsure if all of this applies to np.ndarray
too, and (b) seeing up a fresh memoryview is likely to be about the same cost an an np.ndarray
so I'm only suggesting using them because I know slicing is quick).
Therefore my main suggestion is to move the typing outwards to try to reduce the number of fresh initializations of these typed arrays. If that isn't possible then I think you may be stuck.