Search code examples
pythondataframedictionarymulti-indexdefaultdict

Converting dictionary to a multi indexed dataframe


I have a defaultdict that is constructed as below:

data = defaultdict(dict)
symbol_list = [
    'ETHUSDT',
    'BTCUSDT'
]
for symbol in symbol_list:
    data[symbol] = load_binance_data(c, symbol, '2021-12-23', timeframe='5m')

This is the axes of the dataframes stored in the dictionary as values:

[DatetimeIndex(['2021-12-23 00:05:00', '2021-12-23 00:10:00',
               '2021-12-23 00:15:00', '2021-12-23 00:20:00',
               '2021-12-23 00:25:00', '2021-12-23 00:30:00',
               '2021-12-23 00:35:00', '2021-12-23 00:40:00',
               '2021-12-23 00:45:00', '2021-12-23 00:50:00',
               ...
               '2021-12-24 19:05:00', '2021-12-24 19:10:00',
               '2021-12-24 19:15:00', '2021-12-24 19:20:00',
               '2021-12-24 19:25:00', '2021-12-24 19:30:00',
               '2021-12-24 19:35:00', '2021-12-24 19:40:00',
               '2021-12-24 19:45:00', '2021-12-24 19:50:00'],
              dtype='datetime64[ns]', name='time', length=526, freq=None), Index(['open', 'high', 'low', 'close', 'volume'],
      dtype='object')]

I want to transform this dictionary to a single dataframe with multiple index as below:

[DatetimeIndex(['2021-12-23 00:05:00', '2021-12-23 00:10:00',
                   '2021-12-23 00:15:00', '2021-12-23 00:20:00',
                   '2021-12-23 00:25:00', '2021-12-23 00:30:00',
                   '2021-12-23 00:35:00', '2021-12-23 00:40:00',
                   '2021-12-23 00:45:00', '2021-12-23 00:50:00',
                   ...
                   '2021-12-24 19:05:00', '2021-12-24 19:10:00',
                   '2021-12-24 19:15:00', '2021-12-24 19:20:00',
                   '2021-12-24 19:25:00', '2021-12-24 19:30:00',
                   '2021-12-24 19:35:00', '2021-12-24 19:40:00',
                   '2021-12-24 19:45:00', '2021-12-24 19:50:00'],
              dtype='datetime64[ns]', name='time', freq=None), 
              MultiIndex([
                  ('open', 'ETHUSDT'),
                  ('open', 'BTCUSDT'),
                  ('high', 'ETHUSDT'),
                  ('high', 'BTCUSDT'),
                  ('low', 'ETHUSDT'),
                  ('low', 'BTCUSDT'),
                  ('close', 'ETHUSDT'),
                  ('close', 'BTCUSDT'),
                  ('volume', 'ETHUSDT'),
                  ('volume', 'BTCUSDT')],
           names=['Attributes', 'Symbols'])]

How can I do this conversion?

Thanks in advance,


Solution

  • If I understood correctly, you have two DataFrames:

    • DataFrame 1, let's call it SYMBOL1 :

      open high low close volume
      time
      2021-12-23 00:05:00 1 3 5 7 9
      2021-12-23 00:10:00 2 4 6 8 10
    • And DataFrame 2, let's call it SYMBOL2 :

      open high low close volume
      time
      2021-12-23 00:05:00 -1 -3 -5 -7 -9
      2021-12-23 00:10:00 -2 -4 -6 -8 -10

    That you're trying to turn into a DataFrame with axes of the shape you gave above. If so then here's one way to do it:

    import pandas as pd
    
    # Code to Create the DataFrames in the example :
    d = {'open': [1, 2], 'high': [3, 4], 'low':[5,6], 'close':[7,8], 'volume':[9,10]}
    df1 = pd.DataFrame(d, index=pd.DatetimeIndex(
        ['2021-12-23 00:05:00', '2021-12-23 00:10:00'], name='time'))
    df2 = df1*-1
    
    # The transformation : 
    new_axis_df1 = pd.MultiIndex.from_product(
        [df1.axes[1].values, ['SYMBOL1']], names=['Attributes', 'Symbols'])
    new_df1 = df1.set_axis(new_axis_df1, axis='columns')
    
    new_axis_df2 = pd.MultiIndex.from_product(
        [df2.axes[1].values, ['SYMBOL2']], names=['Attributes', 'Symbols'])
    new_df2 = df2.set_axis(new_axis_df2, axis='columns')
    
    # Merging the transformed DataFrames
    final_df = new_df1.merge(new_df2, on='time')
    
    #Result : 
    print(final_df.axes)
    

    This produces and outputs the following DataFrame axes :

    [DatetimeIndex(['2021-12-23 00:05:00', '2021-12-23 00:10:00'], dtype='datetime64[ns]', name='time', freq=None),
     MultiIndex([(  'open', 'SYMBOL1'),
                 (  'high', 'SYMBOL1'),
                 (   'low', 'SYMBOL1'),
                 ( 'close', 'SYMBOL1'),
                 ('volume', 'SYMBOL1'),
                 (  'open', 'SYMBOL2'),
                 (  'high', 'SYMBOL2'),
                 (   'low', 'SYMBOL2'),
                 ( 'close', 'SYMBOL2'),
                 ('volume', 'SYMBOL2')],
                names=['Attributes', 'Symbols'])]
    

    An example of the final_df, would look like this:

    Attributes open high low close volume open high low close volume
    Symbols SYMBOL1 SYMBOL1 SYMBOL1 SYMBOL1 SYMBOL1 SYMBOL2 SYMBOL2 SYMBOL2 SYMBOL2 SYMBOL2
    time
    2021-12-23 00:05:00 1 3 5 7 9 -1 -3 -5 -7 -9
    2021-12-23 00:10:00 2 4 6 8 10 -2 -4 -6 -8 -10