This is my first post, I am trying to make an ising model simulation using a monte carlo algorithm. I order to optimise the code I am currently trying to use Cython. However for now the execution time of the python and cython programm are identical. If anyone has an idea to either improve the cython implementation or to make the program run faster without using cython please let me know.
Thanks
PS : here is the code
#cython: language_level=3
import numpy as np
import time
cimport numpy as np
cpdef gen_neighbors_2d(int a):
cdef int n = a * a
cdef np.ndarray neighbors = np.zeros((n, 4), dtype=np.int32)
cdef int i, j, index
for i in range(a):
for j in range(a):
index = i * a + j
neighbors[index, 0] = i * a + (j - 1) % a # left
neighbors[index, 1] = i * a + (j + 1) % a # right
neighbors[index, 2] = ((i - 1) % a) * a + j # up
neighbors[index, 3] = ((i + 1) % a) * a + j # down
return neighbors
cpdef int MH_single_flip(np.ndarray[int, ndim=2] neighbors_list, double T, int iterations, int size):
cdef np.ndarray[int, ndim=1] spins = np.random.choice([-1, 1], size=size)
cdef int step, n, spin, delta_E
for step in range(iterations):
n = np.random.randint(0, size)
spin = spins[n]
delta_E = 2 * spin * np.add.reduce(spins[neighbors_list[n]])
if delta_E < 0 or -T*np.log(np.random.random(1)[0]) > delta_E:
spins[n] = -1*spin
return spins
neighbors_list = gen_neighbors_2d(100)
T_list = np.linspace(1, 4, 1000)
size = neighbors_list.shape[0]
t1 = time.time()
for T in range(len(T_list)):
blank_var = MH_single_flip(neighbors_list, T_list[T], 5000, size)
t2 = time.time()
print(t2-t1)
I at first tried to run the cython code without any variable definitions but it only made the code run 5 percent faster. Any of the upgrades I added after that increased performance by more than 1 percent. Any help is welcome !
Let's time your implementation as a benchmark reference:
%timeit MH_single_flip(neighbors_list, T, 5000, size)
>>> 15 ms ± 181 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Here are some hints in arbitrary order (you can find some of them also in DavidW's excellent answer linked in the comments)
Calling numpy functions like np.random.randint
and np.random.random
with scalar arguments inside a loop is pretty inefficient as function calls in Python have noticeable overhead. As numpy ufuncs call C code under the hood, most of the time is spent for checking the function arguments on the Python level rather than the calculation itself. You should always try to do as much (vectorized) calculations as possible with a single numpy function call. That way, you can benefit from the fast calculations under the hood and don't need to worry much about the Python overhead.
Similarly, np.add.reduce
only computes the sum of four integers, so it's dominated by the Python overhead. Here, using a plain C function is reasonable and faster alternative.
You should prefer typed memoryviews to the old typed np.ndarrays. The syntax is cleaner and they are slightly faster.
You can trade safety for speed by disabling all index checks of the numpy arrays / memoryviews by the Cython directive boundschecks
.
Here's some code snippet:
%%cython
import numpy as np
from cython import wraparound, boundscheck, cdivision, initializedcheck
@wraparound(False)
@boundscheck(False)
@initializedcheck(False)
@cdivision(True)
def gen_neighbors_2d(int a):
cdef int n = a * a
cdef int[:, ::1] neighbors = np.zeros((n, 4), dtype=np.int32)
cdef int i, j, index
for i in range(a):
for j in range(a):
index = i * a + j
neighbors[index, 0] = i * a + (j - 1) % a # left
neighbors[index, 1] = i * a + (j + 1) % a # right
neighbors[index, 2] = ((i - 1) % a) * a + j # up
neighbors[index, 3] = ((i + 1) % a) * a + j # down
return np.asarray(neighbors)
@wraparound(False)
@boundscheck(False)
@initializedcheck(False)
cdef int calculate_delta_e(int spin, long[::1] spins, int[::1] neighbors_list):
return 2 * spin * (spins[neighbors_list[0]] + spins[neighbors_list[1]] + spins[neighbors_list[2]] + spins[neighbors_list[3]])
@wraparound(False)
@boundscheck(False)
@initializedcheck(False)
def MH_single_flip(int[:, ::1] neighbors_list, double T, int iterations, int size):
cdef long[::1] spins = np.random.choice([-1, 1], size=size)
cdef int[::1] nns = np.random.randint(0, size, iterations, dtype=np.int32)
cdef double[::1] log_rand_nums = np.log(np.random.random(iterations))
cdef int i, delta_E, n, spin
for i in range(iterations):
n = nns[i]
spin = spins[n]
delta_E = calculate_delta_e(spin, spins, neighbors_list[n])
if delta_E < 0 or -T*log_rand_nums[i] > delta_E:
spins[n] = -1*spin
return np.asarray(spins)
On my machine, this is approximately 75 times faster than your version:
%timeit MH_single_flip(neighbors_list, T, 5000, size)
>>> 213 µs ± 527 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)