Search code examples
pythonarraysnumpyscipyinterpolation

How do I create a regularly spaced grid from irregular data that includes all the original data points? Is it even worth doing?


I have the data in the following way: an x array of longitudes, a y array of longitudes, and a z array of the rainfall at those latitudes and longitudes. I want to create a matrix to test different interpolation methods (many of which are not available by default in scipy or similar toolkits). x and y are unevenly spaced but have the same length (i.e. some part of x has the following values: 34.9912, 35.1568, 35.8881 but later on x contains 46, 45, 47). I want to create the coarsest possible regularly spaced matrix that contains each and every points. I have illustrated an example below:

Suppose x=[36,38,31], y=[12,19,15], and z=[1,2,3]. I want to create the following matrix:

0 0 0 0 0 0 0 2
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0

where the positive values are located where the function takes values. How do I do this in Python or any other language? (I would prefer Python)

Someone asked a similar question here and was told to do the following:

from scipy import interpolate
f1 = interpolate.interp1d(x, y)
x_new = np.linspace(x[0], x[-1], max_len)
y_new = f1(x_new)

however this doesn't work because it doesn't contain the original points. Also, scipy.interpolate.interp1d is going to be deprecated soon, so I'm looking for a more modern solution.

The above example is obviously a toy example with integer gaps. Since the real data contains points to the fourth decimal, I can foresee that I would need to make at least 10000 points just to move forward an integer, and this would very quickly get unwieldy when dealing with real data, which may, eventually, encompass the entire world. It's not storing the data but the interpolation which would take time, so I'm looking to reduce the computation time as much as possible.

However, I am not sure whether this is the best way to approach the problem. Examples on the internet suggest using griddata or something similar to generate a grid quickly. However this has the disadvantage of being unevenly spaced and creates simply a grid. Would using something like linspace(x[0],x[-1],10000) be the right approach? I foresee endless hours of waiting while I'm testing my interpolation if I end up using my proposed method and if it turns out that it could be done in an easier way, then I don't want to waste my time.

Edit 1: As suggested by @Reinderien, here is the histogram for the latitudes and longitudes:

Latitudes

Longitudes

Here are the histograms obtained by np.diff(np.sort)Latitudes

Longitudes]4

Edit 2: better binned plots for the second partenter image description hereenter image description here


Solution

  • Is it even worth doing?

    That depends on a lot of things, including how many points you have, what precision you need and what memory impact you can sustain.

    Fundamentally this is a GCD; producing the following axes:

    import numpy as np
    
    x = np.array([36, 38, 30])
    y = np.array([12, 19, 15])
    z = np.array([1, 2, 3])
    
    xspace = np.gcd.reduce(np.diff(np.sort(x)))
    yspace = np.gcd.reduce(np.diff(np.sort(y)))
    xreg = np.arange(x.min(), x.max() + xspace, xspace)
    yreg = np.arange(y.min(), y.max() + yspace, yspace)
    print(xreg)
    print(yreg)
    
    [30 32 34 36 38]
    [12 13 14 15 16 17 18 19]
    

    The differential over both dimensions is fed into a GCD that gives you your coarsest spacing. Note that I've changed your input data to be "more interesting" so that xspace takes a value of 2; otherwise both of your example arrays have differentials for which the greatest common divisor is 1.

    For floating-point values such as your 34.9912 you're going to need to multiply by your precision factor (1e4?) and coerce to an integer. The higher your precision factor, the more massive your regularized grid may become.