I have the following dataframe, with the following values. I want to insert rows in order to have a row for every person (Toby, Jane, David), and for every month in 2020. If there is no value for x or y, then fill with 0.
ID Name Date x y
0 001 Toby 2020-01-01 15 NaN
1 001 Toby 2020-02-01 12 7
2 001 Toby 2020-05-01 7 1
3 001 Toby 2020-07-01 NaN 1
4 002 Jane 2020-11-01 20 1
5 002 Jane 2020-12-01 21 10
6 003 David 2020-07-01 -3 2
The resulting dataframe should have 36 rows, 12, for each person.
ID Name Date x y
0 001 Toby 2020-01-01 15 0
1 001 Toby 2020-02-01 12 7
2 001 Toby 2020-03-01 0 0
3 001 Toby 2020-04-01 0 0
4 001 Toby 2020-05-01 7 1
5 001 Toby 2020-06-01 0 0
6 001 Toby 2020-07-01 0 1
7 001 Toby 2020-08-01 0 0
8 001 Toby 2020-09-01 0 0
9 001 Toby 2020-10-01 0 0
10 001 Toby 2020-11-01 0 0
11 001 Toby 2020-12-01 0 0
12 002 Jane 2020-01-01 0 0
13 002 Jane 2020-02-01 0 0
14 002 Jane 2020-03-01 0 0
15 002 Jane 2020-04-01 0 0
16 002 Jane 2020-05-01 0 0
17 002 Jane 2020-06-01 0 0
18 002 Jane 2020-07-01 0 0
19 002 Jane 2020-08-01 0 0
20 002 Jane 2020-09-01 0 0
21 002 Jane 2020-10-01 0 0
22 002 Jane 2020-11-01 20 1
23 002 Jane 2020-12-01 21 10
24 003 David 2020-01-01 0 0
25 003 David 2020-02-01 0 0
26 003 David 2020-03-01 0 0
27 003 David 2020-04-01 0 0
28 003 David 2020-05-01 0 0
29 003 David 2020-06-01 0 0
30 003 David 2020-07-01 -3 2
31 003 David 2020-08-01 0 0
32 003 David 2020-09-01 0 0
33 003 David 2020-10-01 0 0
34 003 David 2020-11-01 0 0
35 003 David 2020-12-01 0 0
I looked into reindex
, and managed to make it work on a single series. But I haven't found a way to generate rows dynamically on a dataframe to then fill the missing values.
Any help would be appreciated.
You can use reindex
for the purpose:
# list of the desired dates
# make sure that it has the same type with `Date` in your data
# here I assume strings
dates = pd.Series([f'2020-{x}-01' for x in range(1,13)]), name='Date')
(df.set_index(['Date']).groupby(['ID','Name'])
.apply(lambda x: x.drop(['ID', 'Name'],axis=1).reindex(dates).fillna(0))
.reset_index()
)