python performance numpy array-broadcasting

Broadcast operation on array of smaller size

I need to improve the performance of an operation performed on arrays of different shapes/sizes. The array pos has a shape of (2, 500) and the xa, xb, ya, yb arrays have shapes of (30,).

The operation shown in the MVCE below combines each of the two dimensions of pos with xa, xb and ya, yb.

Can this be done applying numpy broadcasting?

import numpy as np

# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)

# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])

# Apply broadcasting on the operation performed by this 'for' block?
vals = []
for p in zip(*pos):
    vals.append(np.sum(np.exp(-0.5 * (
        ((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))

Solution

You can use np.tile and modify the for loop as follows

xa_tiled = np.tile(xa, (pos.shape[1],1))
xb_tiled = np.tile(xb, (pos.shape[1],1))
ya_tiled = np.tile(ya, (pos.shape[1],1))
yb_tiled = np.tile(yb, (pos.shape[1],1))

vals_ = np.exp(-0.5 * (
         ((pos[0].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
vals_ = vals_.sum(axis=1)

Explanation:

In each iteration you are taking pos[0][i] and pos[1][i] and doing operation on xa, xb, ya, yb.
Tile replicates all 4 of this 250000 times which is the shape[1] of pos or the number of iterations.
We also need to reshape pos[0] and pos[1] and just make them 2D for operations to be valid.

Timing details: On my machine vectorized code takes ~ .20 sec whereas non-vectorized code takes around 3 sec. Below is the code to reproduce:

import numpy as np
import time

# Some random data
N = 30
xa, xb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)
ya, yb = np.random.uniform(0., 1., N), np.random.uniform(0., 1., N)

# Grid
M = 500
ext = [xa.min(), xa.max(), ya.min(), ya.max()]
x, y = np.mgrid[ext[0]:ext[1]:complex(0, M), ext[2]:ext[3]:complex(0, M)]
pos = np.vstack([x.ravel(), y.ravel()])

# Apply broadcasting on the operation performed by this 'for' block?

start = time.time()
for i in range(10):
    vals = []
    for p in zip(*pos):
        vals.append(np.sum(np.exp(-0.5 * (
            ((p[0] - xa) / xb)**2 + ((p[1] - ya) / yb)**2)) / (xb * yb)))
stop = time.time()
print( (stop-start)/10)        

start = time.time()

for i in range(10):
    xa_tiled = np.tile(xa, (pos.shape[1],1))
    xb_tiled = np.tile(xb, (pos.shape[1],1))
    ya_tiled = np.tile(ya, (pos.shape[1],1))
    yb_tiled = np.tile(yb, (pos.shape[1],1))

    vals_ = np.exp(-0.5 * (
            ((pos[0,:].reshape(pos.shape[1],1) - xa_tiled) / xb_tiled)**2 + ((pos[1].reshape(pos.shape[1],1) - ya_tiled) / yb_tiled)**2)) / (xb_tiled * yb_tiled)
    vals_ = vals_.sum(axis=1)
stop = time.time()
print( (stop-start)/10)        

print(np.allclose(vals_, np.array(vals))==True)