I have a Pandas dataframe containing several NaNs like the following one
df = pd.DataFrame(np.array([[1,np.nan,np.nan, 2, 3, np.nan], [np.nan,np.nan,0,0,np.nan,2], [np.nan,4,np.nan,np.nan,5,np.nan]]).transpose())
print(df)
0 1 2
0 1.0 NaN NaN
1 NaN NaN 4.0
2 NaN 0.0 NaN
3 2.0 0.0 NaN
4 3.0 NaN 5.0
5 NaN 2.0 NaN
Now, I would like to fill up the NaN values by respecting two restrictions:
Desired solution:
0 1 2
0 1.0 NaN NaN
1 1.0 NaN 4.0
2 1.0 0.0 4.0
3 2.0 0.0 4.0
4 3.0 0.0 5.0
5 NaN 2.0 NaN
I found Pandas interpolate function which sounded quite promising but unfortunately I'm only able to achieve one of the mentioned restrictions.
When I use
df_padded = df.interpolate(method='pad')
the right values are used (-> preceding number of the respective column) but also the NaNs at the end of column 0 and 2 are replaced which I do not intend.
0 1 2
0 1.0 NaN NaN
1 1.0 NaN 4.0
2 1.0 0.0 4.0
3 2.0 0.0 4.0
4 3.0 0.0 5.0
5 3.0 2.0 5.0
When I use
df_padded = df.interpolate(limit_area='inside')
it replaces the right NaNs but it interpolates in a linear way which is also not what I want.
0 1 2
0 1.000 NaN NaN
1 1.333 NaN 4.000
2 1.667 0.0 4.333
3 2.000 0.0 4.667
4 3.000 1.0 5.000
5 NaN 2.0 NaN
So I thought using both parameters in the function call would give me the right output
df_padded = df.interpolate(method_'pad', limit_area='inside')
but instead nothing is happening at all.
0 1 2
0 1.0 NaN NaN
1 NaN NaN 4.0
2 NaN 0.0 NaN
3 2.0 0.0 NaN
4 3.0 NaN 5.0
5 NaN 2.0 NaN
What am I doing wrong?
Ideally the interpolate
solution should have worked, I believe it might be a bug nevertheless here is alternative approach that you can use
s1, s2 = df.ffill(), df.bfill()
result = s1.mask(s1.isna() | s2.isna())
Result
0 1 2
0 1.0 NaN NaN
1 1.0 NaN 4.0
2 1.0 0.0 4.0
3 2.0 0.0 4.0
4 3.0 0.0 5.0
5 NaN 2.0 NaN