Search code examples
python-3.xnumpyoptimization

How can I optimize a numpy code of averaging data


I'm currently trying to optimize a function in Python.

Is there a clever use of Numpy to get result in less time ?

I have data of shape (30000,18000) and a window_size of 2.

import time
import numpy as np
def denoise(data, window_size, axis=0):
    output = np.zeros((data.shape[axis], data.shape[axis+1] // window_size))
    i = 0
    for k in range(0, data.shape[axis + 1] - window_size, window_size):
        output[:,i] = np.sum(data[:,k:k+window_size],axis=axis+1)/window_size
        i = i+1
    return output

data = np.random.random((30000,18000))
start = time.time()
output = denoise(data,2)
print(f'Elapsed time {time.time()-start} s')

Solution

  • You can make a small but significant improvement to your current method.

    n = data.shape[1]
    output = sum( data[:, k:n:window_size] for k in range(window_size) )/window_size
    return output
    

    That removes the need for iterating over each element, and instead using numpy's slice routine. The time goes from 25s to less than 2s for me.

    Note the axis argument doesn't work in your original example.