I have a DataFrame that is looking like this (using .to_markdown() which is showing the 3 index columns as one) :
id | col1 | col2 | |
---|---|---|---|
(2022, 'may', 27) | LB46 | 10 | 12 |
(2022, 'may', 28) | LB46 | 10 | 12 |
(2022, 'may', 29) | LB46 | 10 | 12 |
(2022, 'may', 30) | LB46 | 10 | 12 |
(2022, 'may', 31) | LB46 | 40 | 12 |
(2022, 'june', 1) | LB46 | 50 | 12 |
(2022, 'june', 2) | LB46 | 90 | 12 |
(2022, 'june', 3) | LB46 | 110 | 12 |
An other preview using .head() :
This is a multi-index Dataframe with 3 index which here are "year", "month" and "day". These index are arbitrary choose for the example, it could be any kind of index. Then I can't rely on date conversion, I want to rely on original order which is the one shown.
But I also want to do a slice between the line (2022, 'may', 29) and (2022, 'june', 1).
To do that I try to use DataFrame.loc :
df= df.loc[('2022','may','29'): ('2022', 'june', '1')]
But it leads to this error :
pandas.errors.UnsortedIndexError: 'Key length (3) was greater than MultiIndex lexsort depth (1)'
So then I tried to get my DataFrame sorted using :
df_csv.sort_index(Inplace=True)
The problem with that is that whatever options of sort_index I use, the order of my DataFrame kept altered (as for example there are no possibility to choose an order other than ascending or descending). Again I don't know the comparison function to use for the sort, so what I really want it to have it "sorted" but without any alteration of the original order.
Is there a way to achieve this ? My goal is to obtain these row as a result after the .loc :
id | col1 | col2 | |
---|---|---|---|
(2022, 'may', 29) | LB46 | 10 | 12 |
(2022, 'may', 30) | LB46 | 10 | 12 |
(2022, 'may', 31) | LB46 | 40 | 12 |
(2022, 'june', 1) | LB46 | 50 | 12 |
An alternative solution I would try is to get the (first) index of the row (2022, 'may', 29), the (last) index of the row (2022, 'june', 1) and then to do an iloc.
I solve my problem as follow :
First convert the indexes type (it was not explicitly described but my example include several index types and as it's not know by advance it could be an issue) :
if df is not None:
if df.index.nlevels > 1:
df.index = pd.MultiIndex.from_frame(
pd.DataFrame(index=df.index).reset_index().astype(str)
)
else:
df.index = df.index.astype(str)
Then it's possible to do an iloc as followed to get the expected result:
df = df.iloc[
df.index.get_loc(('2022','may','29':) : df.index.get_loc('2022','june','01')
]