Resample pandas df with multiple groupbys so each condition has the same number of total days of data

I have been going round in circles with this and haven't been able to figure it out.

Suppose I have the following dataframe:

df = pd.DataFrame({
    "person_id": ["1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3"],
    "event": ["Alert1", "Alert1", "Alert1", "Alert2", "Alert1", "Alert1", "Alert1", "Alert2", "Alert2", "Alert2", "Alert2", "Alert2"],
    "mode": ["Manual", "Manual", "Auto", "Manual", "Auto", "Auto", "Auto", "Manual", "Manual", "Manual", "Auto", "Manual"],
    "date": ["2020-01-01", "2020-01-01", "2020-01-03", "2020-01-03", "2020-01-03", "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05", "2020-01-05"]
}
)

df

index	person_id	event	mode	date
0	1	Alert1	Manual	2020-01-01
1	1	Alert1	Manual	2020-01-01
2	1	Alert1	Auto	2020-01-03
3	1	Alert2	Manual	2020-01-03
4	2	Alert1	Auto	2020-01-03
5	2	Alert1	Auto	2020-01-03
6	2	Alert1	Auto	2020-01-04
7	2	Alert2	Manual	2020-01-04
8	3	Alert2	Manual	2020-01-04
9	3	Alert2	Manual	2020-01-04
10	3	Alert2	Auto	2020-01-05
11	3	Alert2	Manual	2020-01-05

What I want is the count of each possible combination per possible day (the minimum date would be the first date appearing in the dataset, in this case 2020-01-01 and the maximum date would be the last date appearing in the dataset, in this case 2020-01-05). For example, in the case of the df above, the output would look like this:

index	person_id	event	mode	date	count
0	1	Alert1	Manual	2020-01-01	2
1	1	Alert1	Auto	2020-01-01	0
2	1	Alert2	Manual	2020-01-01	0
3	1	Alert2	Auto	2020-01-01	0
4	1	Alert1	Manual	2020-01-02	0
5	1	Alert1	Auto	2020-01-02	0
6	1	Alert2	Manual	2020-01-02	0
7	1	Alert2	Auto	2020-01-02	0
8	1	Alert1	Manual	2020-01-03	0
9	1	Alert1	Auto	2020-01-03	1
10	1	Alert2	Manual	2020-01-03	1
11	1	Alert2	Auto	2020-01-03	0
12	1	Alert1	Manual	2020-01-04	0
13	1	Alert1	Auto	2020-01-04	0
14	1	Alert2	Manual	2020-01-04	0
15	1	Alert2	Auto	2020-01-04	0
16	1	Alert1	Manual	2020-01-05	0
17	1	Alert1	Auto	2020-01-05	0
18	1	Alert2	Manual	2020-01-05	0
19	1	Alert2	Auto	2020-01-05	0
20	2	Alert1	Manual	2020-01-01	0
21	2	Alert1	Auto	2020-01-01	0
22	2	Alert2	Manual	2020-01-01	0
23	2	Alert2	Auto	2020-01-01	0
24	2	Alert1	Manual	2020-01-02	0
25	2	Alert1	Auto	2020-01-02	0
26	2	Alert2	Manual	2020-01-02	0
27	2	Alert2	Auto	2020-01-02	0
28	2	Alert1	Manual	2020-01-03	0
29	2	Alert1	Auto	2020-01-03	2
30	2	Alert2	Manual	2020-01-03	0
31	2	Alert2	Auto	2020-01-03	0
32	2	Alert1	Manual	2020-01-04	0
33	2	Alert1	Auto	2020-01-04	1
34	2	Alert2	Manual	2020-01-04	1
35	2	Alert2	Auto	2020-01-04	0
36	2	Alert1	Manual	2020-01-05	0
37	2	Alert1	Auto	2020-01-05	0
38	2	Alert2	Manual	2020-01-05	0
39	2	Alert2	Auto	2020-01-05	0
40	3	Alert1	Manual	2020-01-01	0
41	3	Alert1	Auto	2020-01-01	0
42	3	Alert2	Manual	2020-01-01	0
43	3	Alert2	Auto	2020-01-01	0
44	3	Alert1	Manual	2020-01-02	0
45	3	Alert1	Auto	2020-01-02	0
46	3	Alert2	Manual	2020-01-02	0
47	3	Alert2	Auto	2020-01-02	0
48	3	Alert1	Manual	2020-01-03	0
49	3	Alert1	Auto	2020-01-03	0
50	3	Alert2	Manual	2020-01-03	0
51	3	Alert2	Auto	2020-01-03	0
52	3	Alert1	Manual	2020-01-04	0
53	3	Alert1	Auto	2020-01-04	0
54	3	Alert2	Manual	2020-01-04	2
55	3	Alert2	Auto	2020-01-04	0
56	3	Alert1	Manual	2020-01-05	0
57	3	Alert1	Auto	2020-01-05	0
58	3	Alert2	Manual	2020-01-05	1
59	3	Alert2	Auto	2020-01-05	1

Importantly, each combination should have the exact same number of unique datetimes at the end, so if I run the following line of code:

df_summarized.groupby(['person_id', 'event', 'mode'])['date'].nunique().reset_index()

The result should clearly show that each combination has 5 unique days of data.

How could I achieve this?

Thanks in advance

Solution

IIUC, what you need to do is first create a finite set of all possible combinations, and then count their occurences.

import pandas as pd
import numpy as np
from itertools import product

# Create the original DataFrame
df = pd.DataFrame({
    "person_id": ["1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3"],
    "event": ["Alert1", "Alert1", "Alert1", "Alert2", "Alert1", "Alert1", "Alert1", "Alert2", "Alert2", "Alert2", "Alert2", "Alert2"],
    "mode": ["Manual", "Manual", "Auto", "Manual", "Auto", "Auto", "Auto", "Manual", "Manual", "Manual", "Auto", "Manual"],
    "date": ["2020-01-01", "2020-01-01", "2020-01-03", "2020-01-03", "2020-01-03", "2020-01-03", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-04", "2020-01-05", "2020-01-05"]
})

df['date'] = pd.to_datetime(df['date'])

person_ids = df['person_id'].unique()
events = df['event'].unique()
modes = df['mode'].unique()
dates = pd.date_range(df['date'].min(), df['date'].max(), freq='D')

all_combinations = pd.DataFrame(list(product(person_ids, events, modes, dates)), columns=['person_id', 'event', 'mode', 'date'])

count_df = df.groupby(['person_id', 'event', 'mode', 'date']).size().reset_index(name='count')

result = all_combinations.merge(count_df, on=['person_id', 'event', 'mode', 'date'], how='left').fillna(0)
result.reset_index(drop=True, inplace