Search code examples
pythonpandasdatetimeslicemulti-index

Change item in list to datetime to match MultiIndex columns type


I have MultiIndex with dates and numbers, and I have list with dates as string:

df.columns

>>>MultiIndex([(2020-10-22,  7),
            (2020-10-22,  8),
            (2020-10-22,  9),
            (2020-10-22, 10),
            (2020-10-22, 11),
            (2020-10-22, 12),
            (2020-10-22, 13),
            (2020-10-22, 14),
            ...
dts

>>>['2020-10-24',
 '2020-10-24',
 '2020-10-25',
 '2020-10-27',
 '2020-10-27',
 '2020-10-28',
 '2020-10-28',
 '2020-10-28',
 '2020-10-30',
...


hrs

>>>[8,
 9,
 14,
 10,
 13,
 11,
 12,
 13,
 7,
 12,
 7,
 11,
 12,
 7,
...

I want the dates of the list and the dataframe to be the same type so I can later slice the dataframe based on the dates that are in the list ( I have also another list of hours...)

I have tried to convert the dates in the list to be the same by changing them to datetime:

dts=[datetime.strptime(x,'%Y-%m-%d') for x in dts]

>>>[datetime.datetime(2020, 10, 24, 0, 0),
 datetime.datetime(2020, 10, 24, 0, 0),
 datetime.datetime(2020, 10, 25, 0, 0),
 datetime.datetime(2020, 10, 27, 0, 0),
 datetime.datetime(2020, 10, 27, 0, 0),
 datetime.datetime(2020, 10, 28, 0, 0),
 datetime.datetime(2020, 10, 28, 0, 0),
 datetime.datetime(2020, 10, 28, 0, 0),
...

but then when I try to slice the dataframe based on those dates it doesn't work and I beleive is becuase it's different type.

for d,h in zip(dts,hrs):
    date=d
    hour=h
    #print(date,hour)
    idx = pd.IndexSlice
    tmp1 = df.loc[:,idx[date,hour]]
...

---> 12 tmp1 = df.loc[:,idx[date,hour]]

KeyError: (datetime.datetime(2020, 10, 24, 0, 0), 8)

I'm not sure but I believe the 0,0 is the problem , However I couldn't sovle it.

My end goal: to be able to slice from my dataframe the correct columns based on the date and the hour.

edit: the dtypes of the columns of df:

        Timestamp
2020-10-22  7            float64
            8            float64
            9            float64
                          ...   
2020-11-29  12           float64

Edit 2:

df.index.get_level_values(0).dtype
>>>dtype('int64')

edit3:

print(df.columns.get_level_values(0)[:4])
>>>Index(['code', 'index', 2020-10-22, 2020-10-22], dtype='object')

print(df.columns.get_level_values(1)[:4])

>>>Index(['', '', 7, 8], dtype='object', name='Timestamp')


Solution

  • Sample:

    a = ['code', 'index', '2020-10-22', '2020-10-22']
    b = ['', '', 7, 8]
    df = pd.DataFrame(1, index=[0], columns=pd.MultiIndex.from_tuples(zip(a, b)))
    

    First is necessary set not datetimes columns to index:

    df = df.set_index([('code',''),('index','')]).rename_axis(['code','index'])
    print (df)
               2020-10-22   
                        7  8
    code index              
    1    1              1  1
    

    And use MultiIndex.remove_unused_levels, then convert first level of columns to datetimes:

    df.columns = df.columns.remove_unused_levels()
    
    df.columns = df.columns.set_levels(pd.to_datetime(df.columns.levels[0]), level=0)
    print (df)
               2020-10-22   
                        7  8
    code index              
    1    1              1  1
    

    Now is possible selecting:

    idx = pd.IndexSlice
    print (df.loc[:,idx['2020-10-22',7]])
               2020-10-22
                        7
    code index           
    1    1              1