I would like to fill missing value in 2 columns. There are Date and Cat2 should be filled with the value of another row based on the last date for predefined Cat1 (predefined in previous filled rows), for example:
Data Example:
Day Date Cat1 Cat2
1 31/12/17 cat mouse
2 01/09/18 cat mouse
3 27/05/18 dog elephant
4 NaN cat NaN
5 NaN cat NaN
So, NaN
of Date
column should be filled based on the last existed date for cat
values from Cat1
column - 01/09/18
. The NaN
in Cat2
column should be filled by mouse
as defined values in row which has already selected for Date
- 01/09/18
.
I read that looping through each row would be very bad practice and that it would be better to do everything in one way.
I'd appreciate for any idea.Thanks)
I tried to do it by fillna
method, but it fill by last values without condition for Cat1.
data.fillna(method='ffill', inplace = True)
Actual result is:
Day Date Cat1 Cat2
1 31/12/17 cat mouse
2 01/09/18 cat mouse
3 27/05/18 dog elephant
4 27/05/18 cat elephant
5 27/05/18 cat elephant
Expected result should be:
Day Date Cat1 Cat2
1 31/12/17 cat mouse
2 01/09/18 cat mouse
3 27/05/18 dog elephant
4 01/09/18 cat mouse
5 01/09/18 cat mouse
I believe you need GroupBy.ffill
with DataFrame.reindex
for same order like original DataFrame
:
df = df.groupby('Cat1').ffill().reindex(df.columns, axis=1)
print (df)
Day Date Cat1 Cat2
0 1 31/12/17 cat mouse
1 2 01/09/18 cat mouse
2 3 27/05/18 dog elephant
3 4 01/09/18 cat mouse
4 5 01/09/18 cat mouse