I am attempting to do something similar to the reindexing in Pandas reindex date index by group revisited
except I have one extra level of indexing involved. My data is in the following structure: [Grid Cell, Site, Date, Value], and I want to reindex all sites within a particular grid cell to the same datetimeindex.
import pandas as pd
import numpy as np
data_dict = {'Grid Cell':[1,1,1,1,1,1,1,2,2,2,2,2,2,2],'Site':['A','A','A','A','B','B','B','C','C','C','D','D','D','D'],'Date':['1999-01-01','1999-02-01','1999-03-01','1999-04-01','1999-01-01','1999-02-01','1999-03-01','2000-01-01','2000-02-01','2000-03-01','2000-01-01','2000-02-01','2000-03-01','2000-04-01'],'Value':[-2.45,-3.72,1.34,4.56,0.23,3.26,6.76,-7.45,-6.43,-2.18,-10.72,-8.97,-5.32,-1.73]}
df = pd.DataFrame.from_dict(data_dict)
unique_dates = df.groupby('Grid Cell')['Date'].unique()
df = df.set_index('Date')
idx = pd.MultiIndex.from_product([df.index.unique(), df['Grid Cell'].unique()],names=['Date','Grid Cell'])
new_df = df.set_index("Grid Cell", append=True).reindex(idx, fill_value=np.nan).reset_index(level=1)
I am attempting to achieve a similar result to the linked example, such that all sites in Grid Cell 1, would be reindexed to the following datetimeindex: '1999-01-01', '1999-02-01', '1999-03-01', '1999-04-01', and all sites in Grid Cell 2 would be reindexed to: '2000-01-01', '2000-02-01', '2000-03-01', '2000-04-01', and the resultant dataframe would look like:
data_dict_out = {'Grid Cell':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],'Site':['A','A','A','A','B','B','B','B','C','C','C','D','D','D','D'],'Date':['1999-01-01','1999-02-01','1999-03-01','1999-04-01','1999-01-01','1999-02-01','1999-03-01','1999-04-01','2000-01-01','2000-02-01','2000-03-01','2000-04-01','2000-01-01','2000-02-01','2000-03-01','2000-04-01'],'Value':[-2.45,-3.72,1.34,4.56,0.23,3.26,6.76,np.nan,-7.45,-6.43,-2.18,np.nan,-10.72,-8.97,-5.32,-1.73]}
df_out = pd.DataFrame.from_dict(data_dict_out)
Unfortunately my partial solution currently produces a "ValueError: cannot handle a non-unique multi-index!" error. Any suggestions on what I need to change here?
IIUC, you can do:
def reindex(g):
idx = pd.MultiIndex.from_product(
[g["Grid Cell"].unique(), g["Site"].unique(), g["Date"].unique()],
names=["Grid Cell", "Site", "Date"],
)
return g.set_index(["Grid Cell", "Site", "Date"]).reindex(idx, fill_value=np.nan)
df = df.groupby("Grid Cell", group_keys=False).apply(reindex).reset_index()
print(df)
Prints:
Grid Cell Site Date Value
0 1 A 1999-01-01 -2.45
1 1 A 1999-02-01 -3.72
2 1 A 1999-03-01 1.34
3 1 A 1999-04-01 4.56
4 1 B 1999-01-01 0.23
5 1 B 1999-02-01 3.26
6 1 B 1999-03-01 6.76
7 1 B 1999-04-01 NaN
8 2 C 2000-01-01 -7.45
9 2 C 2000-02-01 -6.43
10 2 C 2000-03-01 -2.18
11 2 C 2000-04-01 NaN
12 2 D 2000-01-01 -10.72
13 2 D 2000-02-01 -8.97
14 2 D 2000-03-01 -5.32
15 2 D 2000-04-01 -1.73