I have MultiIndex with dates and numbers, and I have list with dates as string:
df.columns
>>>MultiIndex([(2020-10-22, 7),
(2020-10-22, 8),
(2020-10-22, 9),
(2020-10-22, 10),
(2020-10-22, 11),
(2020-10-22, 12),
(2020-10-22, 13),
(2020-10-22, 14),
...
dts
>>>['2020-10-24',
'2020-10-24',
'2020-10-25',
'2020-10-27',
'2020-10-27',
'2020-10-28',
'2020-10-28',
'2020-10-28',
'2020-10-30',
...
hrs
>>>[8,
9,
14,
10,
13,
11,
12,
13,
7,
12,
7,
11,
12,
7,
...
I want the dates of the list and the dataframe to be the same type so I can later slice the dataframe based on the dates that are in the list ( I have also another list of hours...)
I have tried to convert the dates in the list to be the same by changing them to datetime:
dts=[datetime.strptime(x,'%Y-%m-%d') for x in dts]
>>>[datetime.datetime(2020, 10, 24, 0, 0),
datetime.datetime(2020, 10, 24, 0, 0),
datetime.datetime(2020, 10, 25, 0, 0),
datetime.datetime(2020, 10, 27, 0, 0),
datetime.datetime(2020, 10, 27, 0, 0),
datetime.datetime(2020, 10, 28, 0, 0),
datetime.datetime(2020, 10, 28, 0, 0),
datetime.datetime(2020, 10, 28, 0, 0),
...
but then when I try to slice the dataframe based on those dates it doesn't work and I beleive is becuase it's different type.
for d,h in zip(dts,hrs):
date=d
hour=h
#print(date,hour)
idx = pd.IndexSlice
tmp1 = df.loc[:,idx[date,hour]]
...
---> 12 tmp1 = df.loc[:,idx[date,hour]]
KeyError: (datetime.datetime(2020, 10, 24, 0, 0), 8)
I'm not sure but I believe the 0,0 is the problem , However I couldn't sovle it.
My end goal: to be able to slice from my dataframe the correct columns based on the date and the hour.
edit: the dtypes of the columns of df:
Timestamp
2020-10-22 7 float64
8 float64
9 float64
...
2020-11-29 12 float64
Edit 2:
df.index.get_level_values(0).dtype
>>>dtype('int64')
edit3:
print(df.columns.get_level_values(0)[:4])
>>>Index(['code', 'index', 2020-10-22, 2020-10-22], dtype='object')
print(df.columns.get_level_values(1)[:4])
>>>Index(['', '', 7, 8], dtype='object', name='Timestamp')
Sample:
a = ['code', 'index', '2020-10-22', '2020-10-22']
b = ['', '', 7, 8]
df = pd.DataFrame(1, index=[0], columns=pd.MultiIndex.from_tuples(zip(a, b)))
First is necessary set not datetimes columns to index:
df = df.set_index([('code',''),('index','')]).rename_axis(['code','index'])
print (df)
2020-10-22
7 8
code index
1 1 1 1
And use MultiIndex.remove_unused_levels
, then convert first level of columns to datetimes:
df.columns = df.columns.remove_unused_levels()
df.columns = df.columns.set_levels(pd.to_datetime(df.columns.levels[0]), level=0)
print (df)
2020-10-22
7 8
code index
1 1 1 1
Now is possible selecting:
idx = pd.IndexSlice
print (df.loc[:,idx['2020-10-22',7]])
2020-10-22
7
code index
1 1 1