Search code examples
pythonpython-3.xnumpycython

NumPy in Cython: compile-time type vs original type


In Cython, there are corresponding compile-time types to NumPy. It seems that the compile-time type are faster than the original type. If we combine them with the C type, for example, there are three keywords that can be used to define a integer type: int, np.int, np.int_t.

In the official tutorial, all these three types are used. This makes me feel confused. Here are my questions:

Is it correct to use the sole data type to achieve a better performance? And, which type should I choose? If using the sole data type is not correct, then how should I determine which type to use in different parts of my program?


Solution

  • The other answer is unfortunately partly right, and the Cython documentation looks like it's unfortunately partly outdated.

    So there's essentially two contexts where you use Numpy types with Cython:

    1. The are Python objects that can be passed to Numpy functions as a dtype= argument. These indicate to Numpy what type of array to create. From Cython's point of view they're the same as any other Python object. However, Numpy treats them as special indicators. np.int was an example of these (but has now been removed in favour of just using the normal Python int). Specificity sized integer dtypes like np.int32 are still available though.

      arr = np.zeros((5,10), dtype=np.int)
      

      These are not Cython-specific. They're the same as you'd use in normal Python code.

    2. The second use is as C integer types. np.int_t does exist (this is where the other answer is wrong). However, it's a C typedef that's only exposed in the .pxd file that wraps the Numpy internals for Cython. They're what you cimport from Numpy, rather than what you import from Numpy.

      You use these types anywhere that a C type would be expected (e.g. cdef int_t some_var or cdef int_t[:] some_memoryview). They largely have the same name as the dtypes but with _t on the end.

    As an example of how you'd combine the two, you can create a 2D memoryview, and allocate an array for it to view with the line

    cdef np.int32_t[:,:] mview = np.zeros((5, 10), dtype=np.int32)
    

    The plain int type has two meanings in Cython. It can be used as the normal Python integer object (e.g. you can pass it as a dtype argument). However, in other contexts Python interprets it as a C integer. Therefore you could do

    cdef int[:,:] mview = np.zeros((5, 10), dtype=int)
    

    and this would also work. The first use it's used as a C type. The second as a normal Python object.

    It's slightly confusing because Cython straddles Python (where types are just Python objects like any other Python object) and C (where a type is used to declare a variable, but is not an object to be passed around in its own right) and it isn't always clear which bits are C-like and which bits are Python-like.