Python Pandas: Groupby multiple columns and linearly interpolate values of column Y based on another X column

Consider the following pandas dataframe

    reference   sicovam     label       id      date        TTM price
0   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  18  52.69
1   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  30  NaN
2   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  49  53.11
3   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  60  NaN
4   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  77  53.69
5   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  90  NaN
6   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  109 54.42
7   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  137 55.15
8   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  171 55.80
9   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  180 NaN
10  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  15  50.04
11  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  30  NaN
12  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  46  50.52
13  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  60  NaN
14  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  74  51.17
15  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  90  NaN
16  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  106 51.95
17  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  134 52.73
18  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  168 53.46
19  SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-05  180 NaN

After grouping by the reference, sicovam, label, id and date columns, I would like to fill the NaN values of the price column via linear interpolation over the TTM value i.e., in the context of the linear interpolation formula, price is the y and TTM is the x variable.

So far, I built the following lines.

def intepolate_group(group):
    group["price"] = group["price"].interpolate(method='linear', limit_direction='both', axis=0)
    return group

new_df = df.groupby(["reference","sicovam","label","id","date"])[["TTM","price"]].apply(intepolate_group)

Nevertheless, the result that I get is the linear interpolation over the index numbers per group. For example for the following part of the dataset, I get 54.06 instead of 53.99. What do I still need in order to interpolate over the TTM variable?

PS: I want to avoid masking via loop (instead of grouping) and setting the TTM as the index, because the dataframe is quite big and such a scenario takes considerable amount of time.

4   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  77  53.69
5   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  90  NaN
6   SCOM_WTI    68801903    WTI Nymex   BBG:CL  2015-01-02  109 54.42

Solution

Here's one approach:

cols = ['reference', 'sicovam', 'label', 'id', 'date']

df['price'] = (
    df.set_index('TTM')
    .groupby(cols)['price']
    .transform(lambda x: x.interpolate(method='index'))
    .values
    )

Output:

df.iloc[4:7]

            reference sicovam  label      id        date  TTM      price
4 SCOM_WTI   68801903     WTI  Nymex  BBG:CL  2015-01-02   77  53.690000
5 SCOM_WTI   68801903     WTI  Nymex  BBG:CL  2015-01-02   90  53.986562
6 SCOM_WTI   68801903     WTI  Nymex  BBG:CL  2015-01-02  109  54.420000

Explanation

Set column 'TTM' as the index with df.set_index and apply df.groupby.
Use groupby.transform and use pd.Series.interpolate with method='index'.
Finally, chain Series.values to assign the result back to df['price'].