I have a dataframe with multiple columns
df = pd.DataFrame({
'Date': [1930, 1931, 1932, 1933,1934],
'File1': [np.nan, 72, 58, 280, 958],
'File2': [22, np.nan, np.nan, np.nan, 13],
'File3': [np.nan, np.nan, np.nan, 18, 137],
'File4': [np.nan, 552, 1377, 280, np.nan],
'File5': [45, 1252, np.nan, 1841, np.nan],
})
I want to perform interpolation on the entire dataframe (using FillMean for example). However the files have different start and end dates. So I need to say interpolate (using FillMean ) between the first value and the last value of that file (It should not interpolate before the start date (first value in the file) or after the end date(last value in the file)
For example for:
expected output
Date File1 File2 File3 File4 File5
0 1930 NaN 22.00 NaN 280.0 45.0
1 1931 72.0 19.75 NaN 552.0 1252.0
2 1932 58.0 17.50 NaN 1377.0 1546.5
3 1933 280.0 15.25 18.0 280.0 1841.0
4 1934 958.0 13.00 137.0 NaN NaN
Use DataFrame.interpolate
with parameter limit_are='inside'
:
df = df.interpolate(limit_are='inside').mask(df.bfill().isna())
print (df)
Date File1 File2 File3 File4 File5
0 1930 NaN 22.00 NaN NaN 45.0
1 1931 72.0 19.75 NaN 552.0 1252.0
2 1932 58.0 17.50 NaN 1377.0 1546.5
3 1933 280.0 15.25 18.0 280.0 1841.0
4 1934 958.0 13.00 137.0 NaN NaN