This is the my dataset and I am trying to fill the date columns that have NaN with min value.
I am trying to fill the NaN in the date columns with the min from that date column for the ag_id (grouped-by element). When I do the below I get unexpected output.
df_test_revenue_1["1st_rev_month"] = df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))
Unexpected output from executing the above :
I was expecting all the values of the column 1st_rev_month
to be 2017-10-01
. Instead it looks like it it picking the values for 1st_rev_month
from column revenue_month
My end goal is to get this result (apple the same logic to rest of the date columns (except revenue_month
column
Your code:
df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))
never refer to column "1st_rev_month". your refer to it on the left hand side, means "put the result in the 1st_rev_month" column. But what is result? it's minimum over all dataframe, not just over "1st_rev_month".
You need to take the minimum only over "1st_rev_month" column.
The fix:
df_test_revenue_1["1st_rev_month"]=df_test_revenue_1.groupby("ag_id")['1st_rev_month'].transform(lambda x: x.fillna(x.min()))
adding ['1st_rev_month']
after groupby to take only relevant column