My question is similar to this one : Filling in date gaps in MultiIndex Pandas Dataframe.
I however want to group by A and B and then use a user defined range to re-index and each of these dates need to be there in the index. So in this case I want to re-index from May 15th 2024 to Aug 5th 2024 in daily intervals and each combination of column A and column b should have all these dates. The dates where there is no value should be zero. Column data type for Column A and Column B is string and Value us int.
Table 1:
Column_A | Column_B | date | Value |
---|---|---|---|
a | a | 6/11/2024 | 22 |
a | a | 7/2/2024 | 35 |
a | a | 7/9/2024 | 14 |
a | a | 7/30/2024 | 9 |
a | a | 7/31/2024 | 4 |
a | a | 08/04/2024 | 40 |
a | a | 08/05/2024 | 18 |
a | b | 7/9/2024 | 4 |
a | b | 8/6/2024 | 2 |
a | b | 9/3/2024 | 5 |
a | c | 7/9/2024 | 1 |
a | c | 8/3/2024 | 2 |
b | a | 8/5/2024 | 3 |
Table 2: reindexed example for where Column A=a and column B =b
Column_A | Column_B | date | Value |
---|---|---|---|
a | a | 5/30/2024 | 0 |
a | a | 5/31/2024 | 0 |
a | a | 6/1/2024 | 0 |
a | a | 6/2/2024 | 0 |
a | a | 6/3/2024 | 0 |
a | a | 6/4/2024 | 0 |
a | a | 6/5/2024 | 0 |
a | a | 6/6/2024 | 0 |
a | a | 6/7/2024 | 0 |
a | a | 6/8/2024 | 0 |
a | a | 6/9/2024 | 0 |
a | a | 6/10/2024 | 0 |
a | a | 6/11/2024 | 22 |
a | a | 6/12/2024 | 0 |
a | a | 6/13/2024 | 0 |
a | a | 6/14/2024 | 0 |
a | a | 6/15/2024 | 0 |
a | a | 6/16/2024 | 0 |
a | a | 6/17/2024 | 0 |
a | a | 6/18/2024 | 0 |
a | a | 6/19/2024 | 0 |
a | a | 6/20/2024 | 0 |
a | a | 6/21/2024 | 0 |
a | a | 6/22/2024 | 0 |
a | a | 6/23/2024 | 0 |
a | a | 6/24/2024 | 0 |
a | a | 6/25/2024 | 0 |
a | a | 6/26/2024 | 0 |
a | a | 6/27/2024 | 0 |
a | a | 6/28/2024 | 0 |
a | a | 6/29/2024 | 0 |
a | a | 6/30/2024 | 0 |
a | a | 7/1/2024 | 0 |
a | a | 7/2/2024 | 35 |
a | a | 7/3/2024 | 0 |
a | a | 7/4/2024 | 0 |
a | a | 7/5/2024 | 0 |
a | a | 7/6/2024 | 0 |
a | a | 7/7/2024 | 0 |
a | a | 7/8/2024 | 0 |
a | a | 7/9/2024 | 14 |
a | a | 7/10/2024 | 0 |
a | a | 7/11/2024 | 0 |
a | a | 7/12/2024 | 0 |
a | a | 7/13/2024 | 0 |
a | a | 7/14/2024 | 0 |
a | a | 7/15/2024 | 0 |
a | a | 7/16/2024 | 0 |
a | a | 7/17/2024 | 0 |
a | a | 7/18/2024 | 0 |
a | a | 7/19/2024 | 0 |
a | a | 7/20/2024 | 0 |
a | a | 7/21/2024 | 0 |
a | a | 7/22/2024 | 0 |
a | a | 7/23/2024 | 0 |
a | a | 7/24/2024 | 0 |
a | a | 7/25/2024 | 0 |
a | a | 7/26/2024 | 0 |
a | a | 7/27/2024 | 0 |
a | a | 7/28/2024 | 0 |
a | a | 7/29/2024 | 0 |
a | a | 7/30/2024 | 9 |
a | a | 7/31/2024 | 4 |
a | a | 8/1/2024 | 0 |
a | a | 8/2/2024 | 0 |
a | a | 8/3/2024 | 0 |
a | a | 8/4/2024 | 40 |
a | a | 8/5/2024 | 18 |
Based on the post I tried:
date_now = datetime.now().date()
date_period = date_now-timedelta(days=67)
idx = pd.date_range(date_period, date_now)
new_index = pd.MultiIndex.from_product(list(idx))--> this gives me an error
"Input must be list-like"
Could some one please help? Thanks in advance
Code
Your Column_A and Column_B are columns, not indexes, so I don't think you need to use reindex
. Create a Cartesian product (merge
with join='cross'
)
start, end = '2024-05-15', '2024-08-05'
tmp = pd.date_range(start, end).to_frame(name='date')
out = (
df[['Column_A', 'Column_B']]
.drop_duplicates()
.merge(tmp, how='cross')
.merge(df.assign(date=pd.to_datetime(df['date'])), how='left')
.assign(Value=lambda x: x['Value'].fillna(0))
)
out
Column_A Column_B date Value
0 a a 2024-05-15 0.0
1 a a 2024-05-16 0.0
2 a a 2024-05-17 0.0
.. ... ... ... ...
329 b a 2024-08-03 0.0
330 b a 2024-08-04 0.0
331 b a 2024-08-05 3.0
[332 rows x 4 columns]
example code
import pandas as pd
data = {'Column_A': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'a', 'b'], 'Column_B': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'a'], 'date': ['6/11/2024', '7/2/2024', '7/9/2024', '7/30/2024', '7/31/2024', '08/04/2024', '08/05/2024', '7/9/2024', '8/6/2024', '9/3/2024', '7/9/2024', '8/3/2024', '8/5/2024'], 'Value': [22, 35, 14, 9, 4, 40, 18, 4, 2, 5, 1, 2, 3]}
df = pd.DataFrame(data)