python performance standard-deviation azimuth

How to decrease execution time of standard deviation calculated azimuthally

I am using the below code to calculate standard deviation azimuthally for 2D array converted from images. However, the code takes couple minutes to run. Can anyone suggest how this can be made faster?

import numpy as np

def radial_profile(data, center):
    y, x = np.indices((data.shape))

    r = np.sqrt((x - center[0])**2 + (y - center[1])**2)
    r = r.astype(np.int64)

    radialprofile = np.zeros(np.max(r) + 1)
    for i in range (np.max(r) + 1):
        radialprofile[i] = np.std(data[r == i])

    return radialprofile

data = random.randint(10, size=(4096, 4096))
std_azi = radial_profile(data, [2048,2048])

CPU times: total: 1min 1s
Wall time: 1min 1s

As you can see in the CPU and Wall time, it takes at least a minute to run this. I have 10,000 such astronomical images that needs to be processed. So, any suggestion on how this code can be made faster would be highly appreciated.

Solution

The main issue is that data[r == i] is done >2000 times. This is very inefficient especially since data needs is quite big.

One could use a group-by strategy so to do this all at once. That being said, Numpy does not support this. AFAIK, there are packages to do that but we can design a more efficient implementation based on the specific properties of this code (e.g. the group indices are relatively small integers). We can implement this in Numba. We need to do a two pass implementation for sake of accuracy.

Here is the resulting code:

import numpy as np
import numba as nb

@nb.njit('(int32[:,::1], UniTuple(int32,2))')
def radial_profile(data, center):
    dh, dw = data.shape
    cx, cy = center
    rMax = np.int64(np.sqrt(max(cx**2 + cy**2, cx**2+(dh-cy)**2, (dw-cx)**2+cy**2, (dw-cx)**2+(dh-cy)**2)))

    # Step 1: computes the mean by group
    sumByGroup = np.zeros(rMax + 1)
    groupSize = np.zeros(rMax + 1, dtype=np.int64)
    for y in range(dh):
        for x in range(dw):
            r = np.int64(np.sqrt((x - cx)**2 + (y - cy)**2))
            sumByGroup[r] += data[y, x]
            groupSize[r] += 1
    meanByGroup = sumByGroup / groupSize

    # Step 2: computes the variance by group based on the mean by group
    sqSumByGroup = np.zeros(rMax + 1)
    for y in range(dh):
        for x in range(dw):
            r = np.int64(np.sqrt((x - cx)**2 + (y - cy)**2))
            sqSumByGroup[r] += (data[y, x] - meanByGroup[r])**2
    varianceByGroup = sqSumByGroup / groupSize

    return np.sqrt(varianceByGroup)

std_azi = radial_profile(data, (2048,2048))

Performance results

Here is results on my machine with a i5-9600KF processor:

Initial code:  33154 ms
New code:         63 ms

The new code is 526 times faster than the initial one.

Note that this code is sequential. Each image can be computed in parallel. The resulting parallel code will scale very well since this code is clearly compute-bound.