Search code examples
pythonpandaspandas-groupby

Filling Missing Date Column using groupby method


I have a dataframe that looks something like:


+---+----+---------------+------------+------------+
|   | id |     date1     |   date2    | days_ahead |
+---+----+---------------+------------+------------+
| 0 |  1 |    2021-10-21 | 2021-10-24 | 3          |
| 1 |  1 |    2021-10-22 | NaN        | NaN        |
| 2 |  1 |    2021-11-16 | 2021-11-24 | 8          |
| 3 |  2 |    2021-10-22 | 2021-10-24 | 2          |
| 4 |  2 |    2021-10-22 | 2021-10-24 | 2          |
| 5 |  3 |    2021-10-26 | 2021-10-31 | 5          |
| 6 |  3 |    2021-10-30 | 2021-11-04 | 5          |
| 7 |  3 |    2021-11-02 | NaN        | NaN        |
| 8 |  3 |    2021-11-04 | 2021-11-04 | 0          |
| 9 |  4 |    2021-10-28 | NaN        | NaN        |
+---+----+---------------+------------+------------+

I am trying to fill the missing data with the days_ahead median of each id group,

For example:
Median of id 1 = 5.5 which rounds to 6
filled value of date2 at index 1 should be 2021-10-28

Similarly, for id 3 Median = 5
filled value of date2 at index 7 should be 2021-11-07

And, for id 4 Median = NaN
filled value of date2 at index 9 should be 2021-10-28


I Tried

df['date2'].fillna(df.groupby('id')['days_ahead'].transform('median'), inplace = True)

But this fills with int values.

Although, I can use lambda and apply methods to identify int and turn it to date, How do I directly use groupby and fillna together?


Solution

  • You can round values with convert to_timedelta, add to date1 with fill_valueparameter and replace missing values:

    df['date1'] = pd.to_datetime(df['date1'])
    df['date2'] = pd.to_datetime(df['date2'])
    
    td = pd.to_timedelta(df.groupby('id')['days_ahead'].transform('median').round(), unit='d')
    df['date2'] = df['date2'].fillna(df['date1'].add(td, fill_value=pd.Timedelta(0)))
    
    print (df)
       id      date1      date2  days_ahead
    0   1 2021-10-21 2021-10-24         3.0
    1   1 2021-10-22 2021-10-28         NaN
    2   1 2021-11-16 2021-11-24         8.0
    3   2 2021-10-22 2021-10-24         2.0
    4   2 2021-10-22 2021-10-24         2.0
    5   3 2021-10-26 2021-10-31         5.0
    6   3 2021-10-30 2021-11-04         5.0
    7   3 2021-11-02 2021-11-07         NaN
    8   3 2021-11-04 2021-11-04         0.0
    9   4 2021-10-28 2021-10-28         NaN