Search code examples
pythonpandasdataframegroup-byinterpolation

Pandas interpolate within a groupby for one column


Similar to this question Pandas interpolate within a groupby but the answer to that question does the interpolate() for all columns. If I only want to limit the interpolate() to one column how do I do that?

Input

    filename    val1    val2
t                   
1   file1.csv   5       10
2   file1.csv   NaN     NaN
3   file1.csv   15      20
6   file2.csv   NaN     NaN
7   file2.csv   10      20
8   file2.csv   12      15

Expected Output

    filename    val1    val2
t                   
1   file1.csv   5       10
2   file1.csv   NaN     15
3   file1.csv   15      20
6   file2.csv   NaN     NaN
7   file2.csv   10      20
8   file2.csv   12      15

This attempt only returns val2 column but not the rest of the columns.

df = df.groupby('filename').apply(lambda group: group['val2'].interpolate(method='index'))

Solution

  • A direct approach:

    df = pd.read_clipboard() # clipboard contains OP sample data
    # interpolate only on col "val2"
    df["val2_interpolated"] = df[["filename","val2"]].groupby('filename')
    .apply(lambda x:x) # WTF
    .interpolate(method='linear')["val2"]
    

    returns:

        filename  val1  val2  val2_interpolated
    t
    1  file1.csv   5.0  10.0               10.0
    2  file1.csv   NaN   NaN               15.0
    3  file1.csv  15.0  20.0               20.0
    6  file2.csv   NaN   NaN               20.0
    7  file2.csv  10.0  20.0               20.0
    8  file2.csv  12.0  15.0               15.0