Speed up Python/Numpy Code for Ising-/Potts-Model Monte-Carlo

For a small side-project I have written an 2D Ising-/Potts-Model Monte-Carlo simulation in Python/Numpy with the (simplified) code below.

Essentially the code does the following:

Sets up a NXM array filled with random integers (orientations) in [1,orientations]
for every time-step (MCS) the every pixel of the array is visited once in a pseudo-randomized order (therefore largest prime under() and index = (a*i + rand) % (N**2) x = index % N y = index // N )
the 8 neighboring array entries are checked (periodic boundary conditions) and the entry is changed to one of the neighboring values
if the energy of the new configuration becomes lower the change is accepted, else it is rejected unless a condition is fulfilled

I tried to speed it up as much as I could think of, but for large arrays (N,M > 500) the code isn't really fast. As I need about 10e5 MCS for an array to see a clear trend, the achieved

1 loop, best of 3: 276 ms per loop

for 1 MCS on a 100x100 array isn't really sufficient. Unfortunately I don't know how to increase the performance given my lack of experience.

I assume the Neighbors() and the calc_dE() functions are the bottle-necks and especially the nested loops but I can't find a way to speed it up. My cython trys weren't really successful as I haven't done anything with cython before so I'm open to any suggestion.

CODE:

(The pyplot commands are just for visualization and are usually commented.)

import math
import numpy as np
import matplotlib.pyplot as plt

def largest_primes_under(N):
    n = N - 1
    while n >= 2:
        if all(n % d for d in range(2, int(n ** 0.5 + 1))):
            return n
        n -= 1

def Neighbors(Lattice,i,j,n=1):
    ''' Returns an flat array of all neighboring sites in the n-th coordination sphere including the center'''
    N, M = Lattice.shape
    rows = [(i-1) % N, i, (i+1) % N]
    cols = [(j-1) % N, j, (j+1) % M]
    return Lattice[rows][:, cols].flatten()

def calc_dE(Lattice, x, y, z):
    N, M = Lattice.shape
    old_energy = 0
    new_energy = 0
    for i in [0,1,-1]:
        for j in [0,1,-1]:
            if i == 0 and j == 0: 
                continue
            if Lattice[x%N,y%M] == Lattice[(x+i)%N,(y+j)%M]:
                old_energy += 1
            elif z == Lattice[(x+i)%N,(y+j)%M]: 
                new_energy += 1 
    return old_energy-new_energy

N, M = 100,100
orientations = N*M
MCS = int(100)

a = largest_primes_under(N*M)
L = np.random.randint(1,orientations+1,size=(N,M))

mat = plt.matshow(L,cmap = plt.get_cmap('plasma', orientations+1), vmin = -0.5, vmax = orientations+0.5, interpolation='kaiser')
plt.axis('off')

for t in range(1,MCS+1):
    rand = np.random.random_integers(N*M)
    for i in range(0,N**2):
        index = (a*i + rand) % (N**2)
        x = index % N
        y = index // N
        n = Neighbors(L,x,y)
        if len(n)-1 == 0: 
            continue
        else: 
            z = np.random.choice(n)
        dE = calc_dE(L,x,y,z)
        if  (dE < 0): 
            L[x%N,y%N] = z      
        elif np.random.sample() < math.exp(-dE*2.5): 
            L[x%N,y%N] = z

    mat.set_data(L)
    plt.draw()
    plt.pause(0.0001)

Solution

Not sure if you have any sort of restrictions in terms of dependencies, but I would definitely look into Numba. It provides a set of decorators (njit, in particular) that will compile your code to machine code and make it potentially much, much faster, provided you are using compatible types (e.g. numpy arrays, but not pandas DataFrames).

Also, not sure what scale you are looking at, but I'm quite you can find examples of much better optimized prime search algorithms than a manually implemented for loop.

Otherwise you can always fall-back on Cython, but it requires re-writing your code.

EDIT: ok, I gave it a try with numba.

A few notes:

moved the whole for loop inside a function, so that you can decorate it with njit
in Neighbors, I had to change rows and cols from lists to np.arrays because numba does not accept indexing through lists
I replaced np.random.random_integers with np.random.randint since the former is deprecated
I replaced math.exp with np.exp which should give a minor performance boost (besides saving you an import)

import numpy as np
from numba import njit

def largest_primes_under(N):
    n = N - 1
    while n >= 2:
        if all(n % d for d in range(2, int(n ** 0.5 + 1))):
            return n
        n -= 1

@njit
def Neighbors(Lattice,i,j,n=1):
    ''' Returns an flat array of all neighboring sites in the n-th coordination sphere including the center'''
    N, M = Lattice.shape
    rows = np.array([(i-1) % N, i, (i+1) % N])
    cols = np.array([(j-1) % N, j, (j+1) % M])
    return Lattice[rows,:][:,cols].flatten()

@njit
def calc_dE(Lattice, x, y, z):
    N, M = Lattice.shape
    old_energy = 0
    new_energy = 0
    for i in [0,1,-1]:
        for j in [0,1,-1]:
            if i == 0 and j == 0: 
                continue
            if Lattice[x%N,y%M] == Lattice[(x+i)%N,(y+j)%M]:
                old_energy += 1
            elif z == Lattice[(x+i)%N,(y+j)%M]: 
                new_energy += 1 
    return old_energy-new_energy

@njit
def fun(L, MCS, a):
    N, M = L.shape

    for t in range(1,MCS+1):
        rand = np.random.randint(N*M)
        for i in range(0,N**2):
            index = (a*i + rand) % (N**2)
            x = index % N
            y = index // N
            n = Neighbors(L,x,y)
            if len(n)-1 == 0: continue
            else: z = np.random.choice(n)
            dE = calc_dE(L,x,y,z)
            if  (dE < 0): L[x%N,y%N] = z      
            elif np.random.sample() < np.exp(-dE*2.5): L[x%N,y%N] = z    
    return L

Running the same example

N, M = 100,100
orientations = N*M
MCS = 1

L = np.random.randint(1,orientations+1,size=(N,M))
a = largest_primes_under(N*M)

through %timeit fun(L, MCS, a) (in Jupyter) gives me

16.9 ms ± 853 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

which is ~15 times faster than what you currently have. There are probably more optimizations you can do, the nice thing about numba is that I obtained a x15 speed up without delving into or changing significantly how your code is implemented.

A couple of general observations:

in Neighbors, the argument/parameter n is not used, so you should remove it for clarity (or update the code)
in Python, you generally want to use lowercase for function names and variables. Uppercase it usually reserved for Classes (not objects) and "global" variables
Your comment above about largest_primes_under being called only once is definitely spot on, I should have looked better at the code.

premature optimization is the root of all evil