Search code examples
pythonpandasnanfillreindex

Pandas insert rows based on value and fill with 0


I have the following dataframe, with the following values. I want to insert rows in order to have a row for every person (Toby, Jane, David), and for every month in 2020. If there is no value for x or y, then fill with 0.

    ID  Name    Date        x   y
0   001 Toby    2020-01-01  15  NaN
1   001 Toby    2020-02-01  12  7
2   001 Toby    2020-05-01  7   1
3   001 Toby    2020-07-01  NaN 1
4   002 Jane    2020-11-01  20  1
5   002 Jane    2020-12-01  21  10
6   003 David   2020-07-01  -3  2

The resulting dataframe should have 36 rows, 12, for each person.

ID  Name        Date        x   y
0   001 Toby    2020-01-01  15  0
1   001 Toby    2020-02-01  12  7
2   001 Toby    2020-03-01  0   0
3   001 Toby    2020-04-01  0   0
4   001 Toby    2020-05-01  7   1
5   001 Toby    2020-06-01  0   0
6   001 Toby    2020-07-01  0   1
7   001 Toby    2020-08-01  0   0
8   001 Toby    2020-09-01  0   0
9   001 Toby    2020-10-01  0   0
10  001 Toby    2020-11-01  0   0
11  001 Toby    2020-12-01  0   0
12  002 Jane    2020-01-01  0   0
13  002 Jane    2020-02-01  0   0
14  002 Jane    2020-03-01  0   0
15  002 Jane    2020-04-01  0   0
16  002 Jane    2020-05-01  0   0
17  002 Jane    2020-06-01  0   0
18  002 Jane    2020-07-01  0   0
19  002 Jane    2020-08-01  0   0
20  002 Jane    2020-09-01  0   0
21  002 Jane    2020-10-01  0   0
22  002 Jane    2020-11-01  20  1
23  002 Jane    2020-12-01  21  10
24  003 David   2020-01-01  0   0
25  003 David   2020-02-01  0   0
26  003 David   2020-03-01  0   0
27  003 David   2020-04-01  0   0
28  003 David   2020-05-01  0   0
29  003 David   2020-06-01  0   0
30  003 David   2020-07-01  -3  2
31  003 David   2020-08-01  0   0
32  003 David   2020-09-01  0   0
33  003 David   2020-10-01  0   0
34  003 David   2020-11-01  0   0
35  003 David   2020-12-01  0   0

I looked into reindex, and managed to make it work on a single series. But I haven't found a way to generate rows dynamically on a dataframe to then fill the missing values.

Any help would be appreciated.


Solution

  • You can use reindex for the purpose:

    # list of the desired dates
    # make sure that it has the same type with `Date` in your data
    # here I assume strings
    dates = pd.Series([f'2020-{x}-01' for x in range(1,13)]), name='Date')
    
    (df.set_index(['Date']).groupby(['ID','Name'])
       .apply(lambda x: x.drop(['ID', 'Name'],axis=1).reindex(dates).fillna(0))
       .reset_index()
    )