I need to perform this following integration for a 2D array:
That is, each point in the grid get the value RC, which is integration over 2D of the difference between the whole field and the value of the field U at certain point (x,y), multiplying the normalized kernel, that in 1D version is:
What I did so far is an inefficient iteration over indexes:
def normalized_bimodal_kernel_2D(x,y,a,x0=0.0,y0=0.0):
""" Gives a kernel that is zero in x=0, and its integral from -infty to
+infty is 1.0. The parameter a is a length scale where the peaks of the
function are."""
dist = (x-x0)**2 + (y-y0)**2
return (dist*np.exp(-(dist/a)))/(np.pi*a**2)
def RC_2D(U,a,dx):
nx,ny=U.shape
x,y = np.meshgrid(np.arange(0,nx, dx),np.arange(0,ny,dx), sparse=True)
UB = np.zeros_like(U)
for i in xrange(0,nx):
for j in xrange(0,ny):
field=(U-U[i,j])*normalized_bimodal_kernel_2D(x,y,a,x0=i*dx,y0=j*dx)
UB[i,j]=np.sum(field)*dx**2
return UB
def centerlizing_2D(U,a,dx):
nx,ny=U.shape
x,y = np.meshgrid(np.arange(0,nx, dx),np.arange(0,ny,dx), sparse=True)
UB = np.zeros((nx,ny,nx,ny))
for i in xrange(0,nx):
for j in xrange(0,ny):
UB[i,j]=normalized_bimodal_kernel_2D(x,y,a,x0=i*dx,y0=j*dx)
return UB
You can see the result of the centeralizing
function here:
U=np.eye(20)
plt.imshow(centerlizing(U,10,1)[10,10])
I'm sure I have additional bugs, so any feedback will be warmly welcome, but what I am really interested is understanding how I can do this operation much faster in vectorized way.
normalized_bimodal_kernel_2D
is called in the two nested loops that each shift only it's offset by small steps. This duplicates many computations.
An optimization for centerlizing_2D
is to compute the kernel once for a bigger range and then define UB
to take shifted views into that. This is possible using stride_tricks
, which unfortunately is rather advanced numpy.
def centerlizing_2D_opt(U,a,dx):
nx,ny=U.shape
x,y = np.meshgrid(np.arange(-nx//2, nx+nx//2, dx),
np.arange(-nx//2, ny+ny//2, dx), # note the increased range
sparse=True)
k = normalized_bimodal_kernel_2D(x, y, a, x0=nx//2, y0=ny//2)
sx, sy = k.strides
UB = as_strided(k, shape=(nx, ny, nx*2, ny*2), strides=(sy, sx, sx, sy))
return UB[:, :, nx:0:-1, ny:0:-1]
assert np.allclose(centerlizing_2D(U,10,1), centerlizing_2D_opt(U,10,1)) # verify it's correct
Yep, it's faster:
%timeit centerlizing_2D(U,10,1) # 100 loops, best of 3: 9.88 ms per loop
%timeit centerlizing_2D_opt(U,10,1) # 10000 loops, best of 3: 85.9 µs per loop
Next, we optimize RC_2D
by expressing it with the optimized centerlizing_2D
routine:
def RC_2D_opt(U,a,dx):
UB_tmp = centerlizing_2D_opt(U, a, dx)
U_tmp = U[:, :, None, None] - U[None, None, :, :]
UB = np.sum(U_tmp * UB_tmp, axis=(0, 1))
return UB
assert np.allclose(RC_2D(U,10,1), RC_2D_opt(U,10,1))
Performance of %timeit RC_2D(U,10, 1)
:
#original: 100 loops, best of 3: 13.8 ms per loop
#@DanielF's: 100 loops, best of 3: 6.98 ms per loop
#mine: 1000 loops, best of 3: 1.83 ms per loop