Search code examples
pythonpandasinterpolationreindex

How to "stretch" my dataframe and interpolate in between existing values


I'm stuck with a simple DataFrame.reindex().interpolate() problem because the dataframes I am using don't have a datetime index.

I have DataFrame1: t that looks like this:

In[1]: import pandas as pd
t = pd.DataFrame({'D18O': [-0.47, -0.12,  0.55,  0.72,  1.8 ,  1.1 ,  0.43, -0.29, -0.55,
       -0.6 , -0.32,  0.28,  0.72,  1.1 ,  1.34,  1.32,  1.11,  0.46,
        0.09,  0.02]})

Out[2]: 
1    -0.47
2    -0.12
3     0.55
4     0.72
5     1.80
6     1.10
7     0.43
8    -0.29
9    -0.55
10   -0.60
11   -0.32
12    0.28
13    0.72
14    1.10
15    1.34
16    1.32
17    1.11
18    0.46
19    0.09
20    0.02
Name: D18O, dtype: float64

I want to "stretch" it to 430 rows by evenly spacing each row and linearly interpolating values in between. This is because my DataFrame2: env has 430 rows and I want to do some later analysis that needs both frames to have the same dimension.

In[2]: env.index
Out[49]: RangeIndex(start=0, stop=430, step=1)

I've tried reindexing and interpolating in many combinations but just can't find the right method. I think the problem is, that 430 is not evenly divisable by 19/20.

new_idx = np.linspace(t.index[0], t.index[-1], env.shape[0])
t.reindex(new_idx).interpolate()

I thought this qould work, but because the indexes are not even it skips most of the values in t and leaves me with a nearly empty new dataframe.

For the reindexing step I expect something like:

In[3]: t['D18O']
Out[3]: 
0          0.47
2.13157     NaN
2.26315     NaN
...         ...
21.5      -0.12
22.63157    NaN
23.76315    NaN
...         ...
...         ...
430        0.02
Name: D18O, dtype: float64

The indexes don't really matter, as long as the values are evenly spaced and the number of rows matches the number of rows in env.


Solution

  • You can use parameter ffill with limit in DataFrame.reindex, but thre is problem with duplicates first value, so possible solution is add first helper value close 0 to index, reindex, remove it by iloc and last interpolate:

    r = pd.RangeIndex(0, 430, 1)
    
    t.loc[-0.001] = 0
    t = t.sort_index()
    new_idx = np.linspace(t.index[0], t.index[-1], len(r))
    print (t.reindex(new_idx, method='ffill', limit=1).iloc[1:].interpolate())
    
                   D18O
    0.043291  -0.470000
    0.087583  -0.454091
    0.131874  -0.438182
    0.176166  -0.422273
    0.220457  -0.406364
    0.264748  -0.390455
    0.309040  -0.374545
    0.353331  -0.358636
    0.397622  -0.342727
    0.441914  -0.326818
    0.486205  -0.310909
    0.530497  -0.295000
    0.574788  -0.279091
    0.619079  -0.263182
    0.663371  -0.247273
    0.707662  -0.231364
    0.751953  -0.215455
    ...
    ...