I am trying to learn cython, where I compile with annotate=True
.
Says in The basic manual:
If a line is white, it means that the code generated doesn’t interact with Python, so will run as fast as normal C code. The darker the yellow, the more Python interaction there is in that line
Then I wrote this code following (as much as I understood) numpy in cython basic manual instructions:
+14: cdef entropy(counts):
15: '''
16: INPUT: pandas table with counts as obsN
17: OUTPUT: general entropy
18: '''
+19: cdef int l = counts.shape[0]
+20: cdef np.ndarray probs = np.zeros(l, dtype=np.float)
+21: cdef int totals = np.sum(counts)
+22: probs = counts/totals
+23: cdef np.ndarray plogp = np.zeros(l, dtype=np.float)
+24: plogp = ( probs.T * (np.log(probs)) ).T
+25: cdef float d = np.exp(-1 * np.sum(plogp))
+26: cdef float relative_d = d / probs.shape[0]
27:
+28: return {'d':d,
+29: 'relative_d':relative_d
30: }
Where all the "+
" at the beginning of the line are yellow in the cython.debug.output.html file.
What am I doing very wrong? How can I make at least part of this function run at c speed? The function returns a python dictionary, hence I think that I can't returned any c data type. I might be wrong here to.
Thank you for the help!
First of all, Cython does not rewrite Numpy functions, it just call them like CPython does. This is the case for np.zeros
, np.sum
or np.log
for example. Such calls will not be faster with Cython. If you want a faster code you can use plain loops to reimplement them in you code. However, this may not be faster: on one hand Numpy calls introduce an overhead (due to type checking AFAIK still enabled with Cython, internal function calls, wrappers, etc) certainly significant if you use small arrays and each function generate huge temporary arrays that are often slow to read/write; on the other hand, some Numpy functions makes use of highly-optimized code (like BLAS or low-level SIMD intrinsics). Moreover, the division in Python does not behave the same way than C. This is why Cython provides the flag cython.cdivision
which can be set to True
(it is False
by default). If the Python division is used, Cython generate a slower wrapping code. Finally, np.ndarray
is a CPython type and behave as such, you can use memoryviews so not to deal with Numpy objects.
If you want to get a fast code, you certainly need to use memoryviews, loops and and avoid creating temporary arrays as well as using multiple threads. Additionally, you can use np.empty
instead of np.zeros
in your case. Besides this, the Numpy transposition is not very efficient and Numpy does not solves this problem. You can implement a tiled-transposition to speed it up but this is not trivial to implement it efficiently. Here is a Numba implementation that can certainly be easily transformed to a Cython code. Putting some cdef
on a Python Numpy code generally does not make it faster.