Search code examples
pythonpandaspandas-groupbyfillna

Issue when I groupby & fill na with the min date value by the grouped element


This is the my dataset and I am trying to fill the date columns that have NaN with min value.

I am trying to fill the NaN in the date columns with the min from that date column for the ag_id (grouped-by element). When I do the below I get unexpected output.

df_test_revenue_1["1st_rev_month"] = df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))

Unexpected output from executing the above :

enter image description here

I was expecting all the values of the column 1st_rev_month to be 2017-10-01. Instead it looks like it it picking the values for 1st_rev_month from column revenue_month

My end goal is to get this result (apple the same logic to rest of the date columns (except revenue_month column enter image description here


Solution

  • Your code:

    df_test_revenue_1.groupby("ag_id").transform(lambda x: x.fillna(x.min()))
    

    never refer to column "1st_rev_month". your refer to it on the left hand side, means "put the result in the 1st_rev_month" column. But what is result? it's minimum over all dataframe, not just over "1st_rev_month".

    You need to take the minimum only over "1st_rev_month" column.

    The fix:

    df_test_revenue_1["1st_rev_month"]=df_test_revenue_1.groupby("ag_id")['1st_rev_month'].transform(lambda x: x.fillna(x.min()))
    

    adding ['1st_rev_month'] after groupby to take only relevant column