Filling missing dates in pandas groupby with aggregate functions

I have a dataframe where each id may have gaps. I'd like to fill in missing dates (with NaN values) within each group before using aggregate functions as they don't seem to go by date, but I'm getting the error below.

import pandas as pd

df = pd.DataFrame({'date': ['2014-08-01',
                            '2014-08-02', 
                            '2014-08-03',
                            '2014-08-04',
                            '2014-08-10',
                            '2014-08-15',
                            '2014-08-12',
                            '2014-08-15',
                            '2014-08-20',
                            '2014-08-21',
                            '2014-08-22',                            
                            '2014-08-30',                            
                           ], 
                   'value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], 

                   'id': ['a', 'a', 'a', 'a', 'a', 'a',  'b', 'b', 'b', 'b', 'b', 'b']})

df['date'] = pd.to_datetime(df['date'])

df.sort_values('date', ascending=True).groupby('id').set_index('date').resample('1D').rolling(2).mean()

AttributeError: 'DataFrameGroupBy' object has no attribute 'set_index'

Solution

IIUC use Series.asfreq for add consecutive datetimes:

df['date'] = pd.to_datetime(df['date'])

df1 = (df.sort_values('date', ascending=True)
        .set_index('date')
        .groupby('id')
        .apply(lambda x: x.asfreq('d').rolling(2).mean())

print (df1)

               value
id date             
a  2014-08-01    NaN
   2014-08-02    1.5
   2014-08-03    2.5
   2014-08-04    3.5
   2014-08-05    NaN
   2014-08-06    NaN
   2014-08-07    NaN
   2014-08-08    NaN
   2014-08-09    NaN
   2014-08-10    NaN
   2014-08-11    NaN
   2014-08-12    NaN
   2014-08-13    NaN
   2014-08-14    NaN
   2014-08-15    NaN
b  2014-08-12    NaN
   2014-08-13    NaN
   2014-08-14    NaN
   2014-08-15    NaN
   2014-08-16    NaN
   2014-08-17    NaN
   2014-08-18    NaN
   2014-08-19    NaN
   2014-08-20    NaN
   2014-08-21    9.5
   2014-08-22   10.5
   2014-08-23    NaN
   2014-08-24    NaN
   2014-08-25    NaN
   2014-08-26    NaN
   2014-08-27    NaN
   2014-08-28    NaN
   2014-08-29    NaN
   2014-08-30    NaN

df['date'] = pd.to_datetime(df['date'])

df2 = (df.sort_values('date', ascending=True)
        .set_index('date')
        .groupby('id')['value']
        .apply(lambda x: x.asfreq('d').rolling(2, min_periods=1).mean()))

print (df2)
id  date      
a   2014-08-01     1.0
    2014-08-02     1.5
    2014-08-03     2.5
    2014-08-04     3.5
    2014-08-05     4.0
    2014-08-06     NaN
    2014-08-07     NaN
    2014-08-08     NaN
    2014-08-09     NaN
    2014-08-10     5.0
    2014-08-11     5.0
    2014-08-12     NaN
    2014-08-13     NaN
    2014-08-14     NaN
    2014-08-15     6.0
b   2014-08-12     7.0
    2014-08-13     7.0
    2014-08-14     NaN
    2014-08-15     8.0
    2014-08-16     8.0
    2014-08-17     NaN
    2014-08-18     NaN
    2014-08-19     NaN
    2014-08-20     9.0
    2014-08-21     9.5
    2014-08-22    10.5
    2014-08-23    11.0
    2014-08-24     NaN
    2014-08-25     NaN
    2014-08-26     NaN
    2014-08-27     NaN
    2014-08-28     NaN
    2014-08-29     NaN
    2014-08-30    12.0
Name: value, dtype: float64

If need forward filling missing values:

df['date'] = pd.to_datetime(df['date'])

s = (df.sort_values('date', ascending=True)
        .set_index('date')
        .groupby('id')['value']
        .apply(lambda x: x.asfreq('d', method='ffill')))

print (s)
id  date      
a   2014-08-01     1
    2014-08-02     2
    2014-08-03     3
    2014-08-04     4
    2014-08-05     4
    2014-08-06     4
    2014-08-07     4
    2014-08-08     4
    2014-08-09     4
    2014-08-10     5
    2014-08-11     5
    2014-08-12     5
    2014-08-13     5
    2014-08-14     5
    2014-08-15     6
b   2014-08-12     7
    2014-08-13     7
    2014-08-14     7
    2014-08-15     8
    2014-08-16     8
    2014-08-17     8
    2014-08-18     8
    2014-08-19     8
    2014-08-20     9
    2014-08-21    10
    2014-08-22    11
    2014-08-23    11
    2014-08-24    11
    2014-08-25    11
    2014-08-26    11
    2014-08-27    11
    2014-08-28    11
    2014-08-29    11
    2014-08-30    12
Name: value, dtype: int64