Search code examples
pythonpandasdataframeinterpolationlinear-interpolation

Interporlating multiple columns in a dataframe


I have a dataframe with multiple columns

df = pd.DataFrame({
'Date': [1930, 1931, 1932, 1933,1934],
'File1': [np.nan, 72, 58, 280, 958],
'File2': [22, np.nan, np.nan, np.nan, 13],
'File3': [np.nan, np.nan, np.nan, 18, 137],
'File4': [np.nan, 552, 1377, 280, np.nan],
'File5': [45, 1252, np.nan, 1841, np.nan],
}) 

I want to perform interpolation on the entire dataframe (using FillMean for example). However the files have different start and end dates. So I need to say interpolate (using FillMean ) between the first value and the last value of that file (It should not interpolate before the start date (first value in the file) or after the end date(last value in the file)

For example for:

  • file 1 no interpolation should take place since there are no missing value after the start date.
  • File 2 there should be interpolation as there are blank value between start date and end date
    And so on…

expected output

   Date  File1  File2  File3   File4   File5
0  1930    NaN  22.00    NaN   280.0    45.0
1  1931   72.0  19.75    NaN   552.0  1252.0
2  1932   58.0  17.50    NaN  1377.0  1546.5
3  1933  280.0  15.25   18.0   280.0  1841.0
4  1934  958.0  13.00  137.0     NaN     NaN

Solution

  • Use DataFrame.interpolate with parameter limit_are='inside':

    df = df.interpolate(limit_are='inside').mask(df.bfill().isna())
    print (df)
       Date  File1  File2  File3   File4   File5
    0  1930    NaN  22.00    NaN     NaN    45.0
    1  1931   72.0  19.75    NaN   552.0  1252.0
    2  1932   58.0  17.50    NaN  1377.0  1546.5
    3  1933  280.0  15.25   18.0   280.0  1841.0
    4  1934  958.0  13.00  137.0     NaN     NaN