Search code examples
pandasdataframenumpymulti-index

Reindexing and filling missing time series values in a multi-indexed Pandas DataFrame as zero while retaining original values


My question is similar to this one : Filling in date gaps in MultiIndex Pandas Dataframe.

I however want to group by A and B and then use a user defined range to re-index and each of these dates need to be there in the index. So in this case I want to re-index from May 15th 2024 to Aug 5th 2024 in daily intervals and each combination of column A and column b should have all these dates. The dates where there is no value should be zero. Column data type for Column A and Column B is string and Value us int.

Table 1:

Column_A Column_B date Value
a a 6/11/2024 22
a a 7/2/2024 35
a a 7/9/2024 14
a a 7/30/2024 9
a a 7/31/2024 4
a a 08/04/2024 40
a a 08/05/2024 18
a b 7/9/2024 4
a b 8/6/2024 2
a b 9/3/2024 5
a c 7/9/2024 1
a c 8/3/2024 2
b a 8/5/2024 3

Table 2: reindexed example for where Column A=a and column B =b

Column_A Column_B date Value
a a 5/30/2024 0
a a 5/31/2024 0
a a 6/1/2024 0
a a 6/2/2024 0
a a 6/3/2024 0
a a 6/4/2024 0
a a 6/5/2024 0
a a 6/6/2024 0
a a 6/7/2024 0
a a 6/8/2024 0
a a 6/9/2024 0
a a 6/10/2024 0
a a 6/11/2024 22
a a 6/12/2024 0
a a 6/13/2024 0
a a 6/14/2024 0
a a 6/15/2024 0
a a 6/16/2024 0
a a 6/17/2024 0
a a 6/18/2024 0
a a 6/19/2024 0
a a 6/20/2024 0
a a 6/21/2024 0
a a 6/22/2024 0
a a 6/23/2024 0
a a 6/24/2024 0
a a 6/25/2024 0
a a 6/26/2024 0
a a 6/27/2024 0
a a 6/28/2024 0
a a 6/29/2024 0
a a 6/30/2024 0
a a 7/1/2024 0
a a 7/2/2024 35
a a 7/3/2024 0
a a 7/4/2024 0
a a 7/5/2024 0
a a 7/6/2024 0
a a 7/7/2024 0
a a 7/8/2024 0
a a 7/9/2024 14
a a 7/10/2024 0
a a 7/11/2024 0
a a 7/12/2024 0
a a 7/13/2024 0
a a 7/14/2024 0
a a 7/15/2024 0
a a 7/16/2024 0
a a 7/17/2024 0
a a 7/18/2024 0
a a 7/19/2024 0
a a 7/20/2024 0
a a 7/21/2024 0
a a 7/22/2024 0
a a 7/23/2024 0
a a 7/24/2024 0
a a 7/25/2024 0
a a 7/26/2024 0
a a 7/27/2024 0
a a 7/28/2024 0
a a 7/29/2024 0
a a 7/30/2024 9
a a 7/31/2024 4
a a 8/1/2024 0
a a 8/2/2024 0
a a 8/3/2024 0
a a 8/4/2024 40
a a 8/5/2024 18

Based on the post I tried: date_now = datetime.now().date()
date_period = date_now-timedelta(days=67)
idx = pd.date_range(date_period, date_now)
new_index = pd.MultiIndex.from_product(list(idx))--> this gives me an error "Input must be list-like"

Could some one please help? Thanks in advance


Solution

  • Code

    Your Column_A and Column_B are columns, not indexes, so I don't think you need to use reindex. Create a Cartesian product (merge with join='cross' )

    start, end = '2024-05-15', '2024-08-05'
    tmp = pd.date_range(start, end).to_frame(name='date')
    
    out = (
        df[['Column_A', 'Column_B']]
        .drop_duplicates()
        .merge(tmp, how='cross')
        .merge(df.assign(date=pd.to_datetime(df['date'])), how='left')
        .assign(Value=lambda x: x['Value'].fillna(0))
    )
    

    out

        Column_A Column_B       date  Value
    0          a        a 2024-05-15    0.0
    1          a        a 2024-05-16    0.0
    2          a        a 2024-05-17    0.0
    ..       ...      ...        ...    ...
    329        b        a 2024-08-03    0.0
    330        b        a 2024-08-04    0.0
    331        b        a 2024-08-05    3.0
    
    [332 rows x 4 columns]
    

    example code

    import pandas as pd
    data = {'Column_A': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'b'], 'Column_B': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'a'], 'date': ['6/11/2024', '7/2/2024', '7/9/2024', '7/30/2024', '7/31/2024', '08/04/2024', '08/05/2024', '7/9/2024', '8/6/2024', '9/3/2024', '7/9/2024', '8/3/2024', '8/5/2024'], 'Value': [22, 35, 14, 9, 4, 40, 18, 4, 2, 5, 1, 2, 3]}
    df = pd.DataFrame(data)