Search code examples
pythonpandasdataframefillna

Python - fill NA by value from previous rows based on identifier column


I would like to fill missing value in 2 columns. There are Date and Cat2 should be filled with the value of another row based on the last date for predefined Cat1 (predefined in previous filled rows), for example:

Data Example:

Day  Date      Cat1      Cat2
1    31/12/17  cat       mouse
2    01/09/18  cat       mouse
3    27/05/18  dog       elephant
4       NaN    cat       NaN
5       NaN    cat       NaN

So, NaN of Date column should be filled based on the last existed date for cat values from Cat1 column - 01/09/18. The NaN in Cat2 column should be filled by mouse as defined values in row which has already selected for Date - 01/09/18.

I read that looping through each row would be very bad practice and that it would be better to do everything in one way.

I'd appreciate for any idea.Thanks)

I tried to do it by fillna method, but it fill by last values without condition for Cat1.

data.fillna(method='ffill', inplace = True)

Actual result is:

Day  Date      Cat1      Cat2
1    31/12/17  cat       mouse
2    01/09/18  cat       mouse
3    27/05/18  dog       elephant
4    27/05/18  cat       elephant
5    27/05/18  cat       elephant

Expected result should be:

Day  Date      Cat1      Cat2
1    31/12/17  cat       mouse
2    01/09/18  cat       mouse
3    27/05/18  dog       elephant
4    01/09/18  cat       mouse
5    01/09/18  cat       mouse

Solution

  • I believe you need GroupBy.ffill with DataFrame.reindex for same order like original DataFrame:

    df = df.groupby('Cat1').ffill().reindex(df.columns, axis=1)
    print (df)
       Day      Date Cat1      Cat2
    0    1  31/12/17  cat     mouse
    1    2  01/09/18  cat     mouse
    2    3  27/05/18  dog  elephant
    3    4  01/09/18  cat     mouse
    4    5  01/09/18  cat     mouse