Search code examples
pythonpython-3.xloopscythontyped-memory-views

How better speed performance in loops would be achieved in cython?


I have started a project in python which mostly consists of loops. A few days ago I read about cython which helps you to get faster code by static-typing. I developed these two functions to check the performance (one is in python and the other in cython):

import numpy as np
from time import clock

size = 11
board = np.random.randint(2, size=(size, size))

def py_playout(board, N):
    black_rave = []
    white_rave = []
    for i in range(N):
        for x in range(board.shape[0]):
            for y in range(board.shape[1]):
                if board[(x,y)] == 0:
                    black_rave.append((x,y))
                else:
                    white_rave.append((x,y))
    return black_rave, white_rave

cdef cy_playout(board, int N):
    cdef list white_rave = [], black_rave = []
    cdef int M = board.shape[0], L = board.shape[1]
    cdef int i=0, x=0, y=0
    for i in range(N):
        for x in range(M):
            for y in range(L):
                if board[(x,y)] == 0:
                    black_rave.append((x,y))
                else:
                    white_rave.append((x,y))
    return black_rave, white_rave

I used the code below to test the performance after all:

t1 = clock()
a = playout(board, 1000)
t2 = clock()
b = playout1(board, 1000)
t3 = clock()

py = t2 - t1
cy = t3 - t2
print('cy is %a times better than py'% str(py / cy))

However I didn't find any noticeable improvements. I haven't used Typed-Memoryviews yet. Can anybody suggest useful solution to improve the speed or help me rewrite the code using typed-memoryview?


Solution

  • You're right, without adding a type to the board parameter in the cython function the speedup isn't that much:

    %timeit py_playout(board, 1000)
    # 321 ms ± 19.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    %timeit cy_playout(board, 1000)
    # 186 ms ± 541 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    But it's still a factor two faster. By adding a type, e.g.

    cdef cy_playout(int[:, :] board, int N):
        # ...
    
    # or if you want explicit types:
    # cimport numpy as np
    # cdef cy_playout(np.int64_t[:, :] board, int N):  # or np.int32_t
    

    It's much faster (almost 10 times faster):

    %timeit cy_playout(board, 1000)
    # 38.7 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    I also used timeit (okay the IPython magic %timeit) to get more accurate timings.


    Note that you can also use to achieve great speedups without any additional static typing:

    import numba as nb
    
    nb_playout = nb.njit(py_playout)  # Just decorated your python function
    
    %timeit nb_playout(board, 1000)
    # 37.5 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)