Resampling timedelta for multivariate data in python pandas

Still new to Pandas:

I have some data (df) which is time series data for different persons and their variables.

The dataframe looks like this:

date1                 date2                 person  var1   var2
2016-01-04 12:56:00   2016-01-05 00:00:00   1       100    200   
2016-01-04 15:11:00   2016-01-05 00:00:00   1       110    120
2016-01-06 18:05:00   2016-01-06 00:00:00   2       220    300
2016-01-07 18:59:00   2016-01-07 00:00:00   2       200    100
2016-01-02 19:30:00   2016-01-02 00:00:00   3       400    500

For each person I'm trying to create a dataframe where:

The index is the timedelta in days between the dates
For each person observations are upsampled so observations for each person has the same length, between -5 and 5 days.
For observations collected on the same day, the mean of variables are taken
Missing values filled with 0

so it looks like this:

days   person  var1   var2
-5     1        0      0   
-4     1        0      0
-3     1        0      0
-2     1        0      0
-1     1        X      X
0      1        Y      Y
...
-5     2        0      0
-4     2        0      0

etc

I've tried using a timedelta index, using a groupby('person') and resampling but I don't know how to get the specified date ranges (-5 to 5) for each person or how to handle the multiple observations.

Pointers very much appreciated, and thank you.

Solution

First, prepare for the full list of persons and interested day difference

persons = df['person'].unique()
date_diffs = np.arange(-5,6)

Then, build a list of all possible combinations for your 'upsampling'.

person_date_diff_combos = [(person, date_diff) for person in persons for date_diff in date_diffs]

Finally, to take average as you need, we first group data by person and by date_diff, take the mean for both variables, then reindex to create the missing rows, and finally fill the missing rows with zeros.

df = df.groupby([
    'person',
    (df['date2'] - df['date1']).dt.days.rename('date_diff'),
])\
    [['var1', 'var2']].mean()\
    .reindex(person_date_diff_combos, axis=0)\
    .fillna(0)