Search code examples
numpyinterpolationlinear-interpolation

fastest way to use numpy.interp on a 2-D array


I have the following problem. I am trying to find the fastest way to use the interpolation method of numpy on a 2-D array of x-coordinates.

import numpy as np

xp = [0.0, 0.25, 0.5, 0.75, 1.0]

np.random.seed(100)
x = np.random.rand(10)
fp = np.random.rand(10, 5)

So basically, xp would be the x-coordinates of the data points, x would be an array containing the x-coordinates of the values I want to interpolate, and fp would be a 2-D array containing y-coordinates of the datapoints.

xp
[0.0, 0.25, 0.5, 0.75, 1.0]

x
array([ 0.54340494,  0.27836939,  0.42451759,  0.84477613,  0.00471886,
        0.12156912,  0.67074908,  0.82585276,  0.13670659,  0.57509333])

fp
array([[ 0.89132195,  0.20920212,  0.18532822,  0.10837689,  0.21969749],
       [ 0.97862378,  0.81168315,  0.17194101,  0.81622475,  0.27407375],
       [ 0.43170418,  0.94002982,  0.81764938,  0.33611195,  0.17541045],
       [ 0.37283205,  0.00568851,  0.25242635,  0.79566251,  0.01525497],
       [ 0.59884338,  0.60380454,  0.10514769,  0.38194344,  0.03647606],
       [ 0.89041156,  0.98092086,  0.05994199,  0.89054594,  0.5769015 ],
       [ 0.74247969,  0.63018394,  0.58184219,  0.02043913,  0.21002658],
       [ 0.54468488,  0.76911517,  0.25069523,  0.28589569,  0.85239509],
       [ 0.97500649,  0.88485329,  0.35950784,  0.59885895,  0.35479561],
       [ 0.34019022,  0.17808099,  0.23769421,  0.04486228,  0.50543143]])

The desired outcome should look like this:

array([ 0.17196795,  0.73908678,  0.85459966,  0.49980648,  0.59893702,
        0.9344241 ,  0.19840596,  0.45777785,  0.92570835,  0.17977264])

Again, looking for the fastest way to do cause this is a simplified version of my problem, which has a length of about 1 million versus 10.

Thanks


Solution

  • So basically you want output equivalent to

    np.array([np.interp(x[i], xp, fp[i]) for i in range(x.size)])
    

    But that for loop is going to make that pretty slow for large x.size

    This should work:

    def multiInterp(x, xp, fp):
        i, j = np.nonzero(np.diff(np.array(xp)[None,:] < x[:,None]))
        d = (x - xp[j]) / np.diff(xp)[j]
        return fp[i, j] + np.diff(fp)[i, j] * d
    

    EDIT: This works even better and can handle bigger arrays:

    def multiInterp2(x, xp, fp):
        i = np.arange(x.size)
        j = np.searchsorted(xp, x) - 1
        d = (x - xp[j]) / (xp[j + 1] - xp[j])
        return (1 - d) * fp[i, j] + fp[i, j + 1] * d
    

    Testing:

    multiInterp2(x, xp, fp)
    Out: 
    array([ 0.17196795,  0.73908678,  0.85459966,  0.49980648,  0.59893702,
            0.9344241 ,  0.19840596,  0.45777785,  0.92570835,  0.17977264])
    

    Timing tests with original data:

        %timeit multiInterp2(x, xp, fp)
    The slowest run took 6.87 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 25.5 µs per loop
    
    %timeit np.concatenate([compiled_interp(x[[i]], xp, fp[i]) for i in range(fp.shape[0])])
    The slowest run took 4.03 times longer than the fastest. This could mean that an intermediate result is being cached.
    10000 loops, best of 3: 39.3 µs per loop
    

    Seems to be faster even for a small size of x

    Let's try something much, much bigger:

    n = 10000
    m = 10000
    
    xp = np.linspace(0, 1, n)
    x = np.random.rand(m)
    fp = np.random.rand(m, n)
    
    %timeit b()  # kazemakase's above
    10 loops, best of 3: 38.4 ms per loop
    
    %timeit multiInterp2(x, xp, fp)
    100 loops, best of 3: 2.4 ms per loop
    

    The advantages scale a lot better even than the complied version of np.interp