Search code examples
pythonpandasdataframeinterpolationfillna

Pandas dynamically replace nan values


I have a DataFrame that looks like this:

df = pd.DataFrame({'a':[1,2,np.nan,1,np.nan,np.nan,4,2,3,np.nan], 
    'b':[4,2,3,np.nan,np.nan,1,5,np.nan,5,8]
})

   a    b
0  1.0  4.0
1  2.0  2.0
2  NaN  3.0
3  1.0  NaN
4  NaN  NaN
5  NaN  1.0
6  4.0  5.0
7  2.0  NaN
8  3.0  5.0
9  NaN  8.0

I want to dynamically replace the nan values. I have tried doing (df.ffill()+df.bfill())/2 but that does not yield the desired output, as it casts the fill value to the whole column at once, rather then dynamically. I have tried with interpolate, but it doesn't work well for non linear data.

I have seen this answer but did not fully understand it and not sure if it would work.

Update on the computation of the values
I want every nan value to be the mean of the previous and next non nan value. In case there are more than 1 nan value in sequence, I want to replace one at a time and then compute the mean e.g., in case there is 1, np.nan, np.nan, 4, I first want the mean of 1 and 4 (2.5) for the first nan value - obtaining 1,2.5,np.nan,4 - and then the second nan will be the mean of 2.5 and 4, getting to 1,2.5,3.25,4

The desired output is

    a    b
0  1.00  4.0
1  2.00  2.0
2  1.50  3.0
3  1.00  2.0
4  2.50  1.5
5  3.25  1.0
6  4.00  5.0
7  2.00  5.0
8  3.00  5.0
9  1.50  8.0

Solution

  • Inspired by the @ye olde noobe answer (thanks to him!):

    I've optimized it to make it ≃ 100x faster (times comparison below):

    def custom_fillna(s:pd.Series):
      for i in range(len(s)):
        if pd.isna(s[i]):
          last_valid_number = (s[s[:i].last_valid_index()] if s[:i].last_valid_index() is not None else 0)
          next_valid_numer = (s[s[i:].first_valid_index()] if s[i:].first_valid_index() is not None else 0)
          s[i] = (last_valid_number+next_valid_numer)/2
    
    custom_fillna(df['a'])
    df
    

    Times comparison:

    enter image description here