Search code examples
pythonperformancestandard-deviationazimuth

How to decrease execution time of standard deviation calculated azimuthally


I am using the below code to calculate standard deviation azimuthally for 2D array converted from images. However, the code takes couple minutes to run. Can anyone suggest how this can be made faster?

import numpy as np

def radial_profile(data, center):
    y, x = np.indices((data.shape))

    r = np.sqrt((x - center[0])**2 + (y - center[1])**2)
    r = r.astype(np.int64)

    radialprofile = np.zeros(np.max(r) + 1)
    for i in range (np.max(r) + 1):
        radialprofile[i] = np.std(data[r == i])

    return radialprofile

data = random.randint(10, size=(4096, 4096))
std_azi = radial_profile(data, [2048,2048])

CPU times: total: 1min 1s
Wall time: 1min 1s

As you can see in the CPU and Wall time, it takes at least a minute to run this. I have 10,000 such astronomical images that needs to be processed. So, any suggestion on how this code can be made faster would be highly appreciated.


Solution

  • The main issue is that data[r == i] is done >2000 times. This is very inefficient especially since data needs is quite big.

    One could use a group-by strategy so to do this all at once. That being said, Numpy does not support this. AFAIK, there are packages to do that but we can design a more efficient implementation based on the specific properties of this code (e.g. the group indices are relatively small integers). We can implement this in Numba. We need to do a two pass implementation for sake of accuracy.

    Here is the resulting code:

    import numpy as np
    import numba as nb
    
    @nb.njit('(int32[:,::1], UniTuple(int32,2))')
    def radial_profile(data, center):
        dh, dw = data.shape
        cx, cy = center
        rMax = np.int64(np.sqrt(max(cx**2 + cy**2, cx**2+(dh-cy)**2, (dw-cx)**2+cy**2, (dw-cx)**2+(dh-cy)**2)))
    
        # Step 1: computes the mean by group
        sumByGroup = np.zeros(rMax + 1)
        groupSize = np.zeros(rMax + 1, dtype=np.int64)
        for y in range(dh):
            for x in range(dw):
                r = np.int64(np.sqrt((x - cx)**2 + (y - cy)**2))
                sumByGroup[r] += data[y, x]
                groupSize[r] += 1
        meanByGroup = sumByGroup / groupSize
    
        # Step 2: computes the variance by group based on the mean by group
        sqSumByGroup = np.zeros(rMax + 1)
        for y in range(dh):
            for x in range(dw):
                r = np.int64(np.sqrt((x - cx)**2 + (y - cy)**2))
                sqSumByGroup[r] += (data[y, x] - meanByGroup[r])**2
        varianceByGroup = sqSumByGroup / groupSize
    
        return np.sqrt(varianceByGroup)
    
    std_azi = radial_profile(data, (2048,2048))
    

    Performance results

    Here is results on my machine with a i5-9600KF processor:

    Initial code:  33154 ms
    New code:         63 ms
    

    The new code is 526 times faster than the initial one.

    Note that this code is sequential. Each image can be computed in parallel. The resulting parallel code will scale very well since this code is clearly compute-bound.