I have a program that works with pandas dataframes, using a multiindex of 2 levels (dates and data) such as:
Date_Time Data
date1 a
b
c
date2 a
b
c
date3 a
b
c
...
So all the functions use the pandas IndexSlice when having to use/modify the contents of the df, like:
df.loc[pd.IndexSlice[:,'a'],:]
This worked great, easy to read, short and efficient, and made possible a lot of one-lines functions.
However, I am currently having to differenciate the data based on some properties in order to not having them merge when doing a resample, and I am doing it by adding a third level to the index when necessary:
Date_Time Property Data
date1 1 a
1 b
1 c
date2 2 a
2 b
2 c
date3 1 a
1 b
1 c
...
The goal is to be able to do a groupby with a resample over time and end up with this multiindex:
Date_Time Property Data
Period1 1 a
1 b
1 c
2 a
2 b
2 c
Period2 1 a
1 b
1 c
...
So, the problem is that df.loc[pd.IndexSlice[:,'a'],:]
no longer works, I would have to change it to
df.loc[pd.IndexSlice[:,:,'a'],:]
But that means changing the code itself everytime I use that dataframe with the extra column.
Isn't there any way to define the slice in a flexible way??
I would like to define the slice using variables, like in list comprehension, so it is future protected against more changes in the length and order of the multiindex levels. But as far as I checked, that is not possible, so what should I do??
I could define the slice using try-except blocks at the beginning of each function, inside the block that already makes sure that level and level_value exists; or move the property level to the right so I could still use pd.IndexSlice[:,'a']
(but in the future I might end up with this problem again)
EDIT: Here is some code to generate a dataframe that uses this kind of index:
iter1=[["03/07/2023 07:40:00", "03/07/2023 07:50:00"], ["S=0.1"],["Probe1","Probe2","Probe3"]]
iter2=[["03/07/2023 07:45:00", "03/07/2023 07:55:00"], ["S=0.2"],["Probe1","Probe2","Probe3"]]
idx1=pd.MultiIndex.from_product(iter1, names=["Date_Time", "Property",'Data'])
idx2=pd.MultiIndex.from_product(iter2, names=["Date_Time", "Property",'Data'])
df_aux1=pd.DataFrame(np.random.randn(6, 3), index=idx1, columns=['X','Y','Error'])
df_aux2=pd.DataFrame(np.random.randn(6, 3), index=idx2, columns=['X','Y','Error'])
df=pd.concat([df_aux1,df_aux2]).sort_index(level='Date_Time')
The exact data and logic is unclear, but since you have named levels you could use Index.get_level_values
and boolean indexing:
df.loc[df.index.get_level_values('Data') == 'a']
Or by position:
df.loc[df.index.get_level_values(-1) == 'a']