Search code examples
pythonpython-3.xpandasinterpolation

Interpolation of a pandas DataFrame


I do have a pandas DataFrame (size = 34,19) which I want to use as a lookup table.

But the values I want to look up are "between" the values in the dataframe

For example:

      0.1      0.2      0.3     0.4    0.5
0.1   4.01  31.86   68.01   103.93  139.2
0.2  24.07  57.49   91.37   125.21  158.57
0.3  44.35  76.4    108.97  141.57  173.78
0.4  59.66  91.02   122.8   154.62  186.13
0.5  87.15  117.9   148.86  179.83  210.48
0.6  106.92 137.41  168.26  198.99  229.06
0.7  121.73 152.48  183.4   213.88  243.33

I know want to look up the value for x = 5.5 y = 1.004, so the answer should be around 114.

I tried it with different methods from scipy but the values I get are always way off. Last method I used was :inter = interpolate.interpn([np.array(np.arange(34)), np.array(np.arange(19))], np_matrix, [x_value, y_value],)
I even get wrong values for points in the grid which do exist.

Can someone tell me what I'm doing wrong or recommend an easy solution for the task?


EDIT:

An additional problem is: My raw data, from an .xlsx file, look like:

      0.1      0.2      0.3     0.4    0.5
0.1   4.01  31.86   68.01   103.93  139.2
0.2  24.07  57.49   91.37   125.21  158.57
0.3  44.35  76.4    108.97  141.57  173.78
0.4  59.66  91.02   122.8   154.62  186.13
0.5  87.15  117.9   148.86  179.83  210.48
0.6  106.92 137.41  168.26  198.99  229.06
0.7  121.73 152.48  183.4   213.88  243.33

But pandas adds an Index column:

               0.1      0.2      0.3     0.4    0.5
0     0.1     4.01    31.86    68.01    103.93  139.2
1     0.2    24.07    57.49    91.37    125.21  158.57
2     0.3    44.35    76.4    108.97    141.57  173.78
3     0.4    59.66    91.02   122.8     154.62  186.13
4     0.8    87.15   117.9    148.86    179.83  210.48
5     1.0   106.92   137.41   168.26    198.99  229.06
6     1.7   121.73   152.48   183.4     213.88  243.33

So if I want to access x = 0.4 y = 0.15 I have to input x = 3, y = 0.15.

Data are read with:

model_references = pd.ExcelFile(model_references_path)
Matrix = model_references.parse('Model_References')
n = Matrix.stack().reset_index().values
out = interpolate.griddata(n[:,0:2], n[:,2], (Stroke, Current), method='cubic')

Solution

  • You can reshape data to 3 columns with stack - first column for index, second for columns and last for values, last get values by scipy.interpolate.griddata

    from scipy.interpolate import griddata
    
    a = 5.5
    b = 1.004
    
    n = df.stack().reset_index().values    
    #https://stackoverflow.com/a/8662243
    out = griddata(n[:,0:2], n[:,2], [(a, b)], method='linear')
    print (out)
    [104.563]
    

    Detail:

    n = df.stack().reset_index().values
    print (n)
    [[  1.     1.     4.01]
     [  1.     2.    31.86]
     [  1.     3.    68.01]
     [  1.     4.   103.93]
     [  1.     5.   139.2 ]
     [  2.     1.    24.07]
     [  2.     2.    57.49]
     [  2.     3.    91.37]
     [  2.     4.   125.21]
     [  2.     5.   158.57]
     [  3.     1.    44.35]
     [  3.     2.    76.4 ]
     [  3.     3.   108.97]
     [  3.     4.   141.57]
     [  3.     5.   173.78]
     [  4.     1.    59.66]
     [  4.     2.    91.02]
     [  4.     3.   122.8 ]
     [  4.     4.   154.62]
     [  4.     5.   186.13]
     [  5.     1.    87.15]
     [  5.     2.   117.9 ]
     [  5.     3.   148.86]
     [  5.     4.   179.83]
     [  5.     5.   210.48]
     [  5.     1.   106.92]
     [  5.     2.   137.41]
     [  5.     3.   168.26]
     [  5.     4.   198.99]
     [  5.     5.   229.06]
     [  6.     1.   121.73]
     [  6.     2.   152.48]
     [  6.     3.   183.4 ]
     [  6.     4.   213.88]
     [  6.     5.   243.33]]