I have the following example dataframe (normal dataframe has more than 1000 rows)
df = pd.DataFrame({'P1':['jaap','tim','piet','tim','tim'],
'P2':['piet','jaap','jaap','piet','jaap'],
'Count1':[2, 3, np.nan, np.nan, np.nan], 'Count2':[3, 1, np.nan, np.nan, np.nan]})
print(df)
P1 P2 Count1 Count2
0 jaap piet 2.0 3.0
1 tim jaap 3.0 1.0
2 piet jaap NaN NaN
3 tim piet NaN NaN
4 tim jaap NaN NaN
Now I want to find a neat way that fills in the NaN
values, according to the following rule:
The names found in P1 and P2 have to be the same.
Thus the nan values found in row number 2
have to be the same as the values in row 0
, only the values have to be swapped since the names are also swapped. The nan values in row 3
should be kept as nan, since the combination of tim
and piet
is not found in any prior row. The nan values in row 4
have to be the same as the values in row 1
.
So the desired result is:
P1 P2 Count1 Count2
0 jaap piet 2.0 3.0
1 tim jaap 3.0 1.0
2 piet jaap 3.0 2.0
3 tim piet NaN NaN
4 tim jaap 3.0 1.0
This question is very similar: Pandas fillna using groupby
Only if the proposed solution in that article is applied to the question in this post, the result is slightly off:
df.groupby(['P1','P2'])[['Count1','Count2']].apply(lambda x: x.fillna(method = 'ffill'))
print(df)
Count1 Count2
P1 P2
0 jaap piet 2.0 3.0
1 tim jaap 3.0 1.0
2 piet jaap NaN NaN
3 tim piet NaN NaN
4 tim jaap 3.0 1.0
As you can see the names jaap
and piet
in row 0
changed column in row 2
, thus it doesn't work.
Idea is create DataFrame with changed order first in concat
with rename
, remove rows with missing values and possible duplicates and change original values by DataFrame.update
:
d = {'P2':'P1','P1':'P2','Count1':'Count2','Count2':'Count1'}
df1 = (pd.concat([df, df.rename(columns=d)])
.dropna(subset=['Count1','Count2'])
.drop_duplicates(['P1','P2']))
df = df.set_index(['P1','P2'])
df1 = df1.set_index(['P1','P2'])
df.update(df1)
df = df.reset_index()
print (df)
P1 P2 Count1 Count2
0 jaap piet 2.0 3.0
1 tim jaap 3.0 1.0
2 piet jaap 3.0 2.0
3 tim piet NaN NaN
4 tim jaap 3.0 1.0