python parallel-processing cython physics montecarlo

How can I use 'prange' in Cython?

I'm trying to solve a two-dimensional Ising model with a Monte Carlo approach.

As it is slow, I used Cython to accelerate the code execution. I would like to push it even further and parallelize the Cython code. My idea is to split the two-dimensional lattice in two, so for any point on a lattice has its nearest neighbours on the other lattice. This way, I can randomly choose one lattice, and I can flip all the spins and this could be done in parallel since all those spins are independent.

So far this is my code:
_{(inspired from http://jakevdp.github.io/blog/2017/12/11/live-coding-cython-ising-model/)}

%load_ext Cython
%%cython
cimport cython
cimport numpy as np
import numpy as np
from cython.parallel cimport prange

@cython.boundscheck(False)
@cython.wraparound(False)

def cy_ising_step(np.int64_t[:, :] field,float beta):

    cdef int N = field.shape[0]
    cdef int M = field.shape[1]

    cdef int offset = np.random.randint(0,2)

    cdef np.int64_t[:,] n_update = np.arange(offset,N,2,dtype=np.int64)

    cdef int m,n,i,j

    for m in prange(M,nogil=True):
        i = m % 2
        for j in range(n_update.shape[0]) :
            n = n_update[j]

            cy_spin_flip(field,(n+i) %N,m%M,beta)

    return np.array(field,dtype=np.int64)


cdef cy_spin_flip(np.int64_t[:, :] field,int n,int m, float beta=0.4,float J=1.0):

    cdef int N = field.shape[0]
    cdef int M = field.shape[1]

    cdef float dE = 2*J*field[n,m]*(field[(n-1)%N,m]+field[(n+1)%N,m]+field[n,(m-1)%M]+field[n,(m+1)%M])

    if dE <= 0 :
        field[n,m] *= -1

    elif np.exp(-dE * beta) > np.random.rand():
        field[n,m] *= -1

I tried using a prange-constructor, but I'm having a lots of troubles with GIL-lock. I’m new to Cython and parallel computing so I could easily have missed something.

The error:

Discarding owned Python object not allowed without gil
Calling gil-requiring function not allowed without gil

Solution

From a Cython point-of-view the main problem is that cy_spin_flip requires the GIL. You need to add nogil to the end of its signature, and set the return type to void (since by default it returns a Python object, which requires the GIL).

However, np.exp and np.random.rand also require the GIL, because they're Python function calls. np.exp is probably easily replaced with libc.math.exp. np.random is a bit harder, but there's plenty of suggestions for C- and C++-based approaches: 1 2 3 4 (+ others).

A more fundamental problem is the line:

cdef float dE = 2*J*field[n,m]*(field[(n-1)%N,m]+field[(n+1)%N,m]+field[n,(m-1)%M]+field[n,(m+1)%M])

You've parallelized this with respect to m (i.e. different values of m are run in different threads), and each iteration changes field. However in this line you are looking up several different values of m. This means the whole thing is a race-condition (the result depends on which order the different threads finish) and suggests your algorithm may be fundamentally unsuitable for parallelization. Or that you should copy field and have field_in and field_out. It isn't obvious to me, but this is something that you should be able to work out.

Edit: it does look like you've given the race condition some thought with using i%2. It isn't obvious to me that this is right though. I think a working implementation of your "alternate cells" scheme would look something like:

for oddeven in range(2):
    for m in prange(M):
        for n in range(N):
            # some mechanism to pick the alternate cells here.

i.e. you need a regular loop to pick the alternate cells outside your parallel loop.