Search code examples
pythonnumpyscipyinterpolationbenchmarking

replace zeros in numpy array with linear interpolation between its preceding and succeeding values


assuming that we have an array a = np.array([1,2,0,4,0,5,0,0,11]) ,how can we get:

array([ 1,  2,  3,  4,  4.5,  5,  7,  9, 11])

What I have tried is:

from scipy.interpolate import interp1d

a = np.array([1,2,0,4,0,5,0,0,11])
b = a[np.nonzero(a)]
brange = np.arange(b.shape[0])
interp = interp1d(brange, b)

This seems to do the actual job of finding in-between values. For instance:

print (interp(1), interp(1.5), interp(2), interp(2.5), interp(3))
#out: 2.0 3.0 4.0 4.5 5.0

But I can't figure out how to re-construct my original array from interp. I also tried the solution to this question, but I had the exact same problem with that solution as well.

UPDATE:

I did a quick benchmark for both solution using numpy and pandas, here is the result:

y = np.array([1,2,0,4,0,5,0,0,11])

def test1(y):

    x = np.arange(len(y))
    idx = np.nonzero(y)
    interp = interp1d(x[idx],y[idx])

    return interp(x)

def test2(y):
    s = pd.Series(y)
    s.interpolate(inplace=True)
    return s.values

%timeit t1 = test1(y)
%timeit t2 = test2(y)

139 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
158 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

About 12% faster. Not as good as I hoped, but since the code is going to be run several million times, it probably worth the effort.


Solution

  • I think your implementation is a bit off. What you want is something closer to what @Thomas came up with:

    y = np.array([1,2,0,4,0,5,0,0,11])
    idx = np.nonzero(y)
    interp = interp1d(x[idx],y[idx])
    
    x = np.arange(len(y))
    ynew = interp(x)
    

    If you want to re-construct your original array from interp, you just need to use the .x and .y parameters.

    a_ = np.zeros(interp.x[-1] + 1)
    a_[interp.x] = interp.y
    

    Of course, this will remove any trailing zeros from the original a, as a.size is not preserved in the interpolation. If you have preserved them elsewhere (such as ynew.shape), you can instead initialize a_ = np.zeros_like(ynew)