This is very similar to these two topics: Follow-up question Original question
I would need to interpolate not on all columns nor on a single column but on a set of columns (for instance two). Furthermore I would need to interpolate on a value which is most of the time missing from the reference column used for the interpolation. A small example will clarify:
Original dataframe:
input_date pivot_column val1 val2
t
1 2023/11/02 16 3300 37
2 2023/11/02 14 3100 33
3 2023/11/02 10 3000 32.7
4 2023/11/03 17.5 3350 36.11
5 2023/11/03 15.75 3200 33.3
6 2023/11/03 14.25 3000 31.3
For each unique value in the column input_date, I would like to compute the interpolated values for columns val1 and val2 based on an hypothetical value of 15 from column pivot_column so something like
input_date pivot_column val1 val2
t
1 2023/11/02 15 3200 35
2 2023/11/03 15 3100 32.3
I can use the solution from the follow-up question thread, ie
df["val2_interpolated"] = df[["filename","val2"]].groupby('filename')
.apply(lambda x:x) # WTF
.interpolate(method='linear')["val2"]
but how can I force the interpolation to be done around a pivot/input value of 15 for instance? do I need to apply my own interpolation function on a dataframe where I add a new row with 15 under the column pivot_column? or after grouping by the dataframe, apply np.interp?
I would concat
the new rows with the value of 15 in pivot_column
and then interpolate
:
other = pd.DataFrame({"input_date": df["input_date"].unique(), "pivot_column": 15})
df2 = pd.concat([df,other])
df3 = df2.groupby("input_date").apply(lambda g: g.set_index("pivot_column").interpolate("index")).drop(columns="input_date").reset_index()
#keep only required rows
output = other.merge(df3)
>>> output
input_date pivot_column val1 val2
0 2023/11/02 15 3200.0 35.0
1 2023/11/03 15 3100.0 32.3