Search code examples
jsonmongodbpandasdataframepanel

combine two pandas dataframe into one dataframe "dict type cell" (pd.Panel deprecated)


I'm trying to concat multiples pandas.DataFrame to be saved in a mongodb in just one collection, all the dataframes have the same index/columns and I wanted to save it, in just one document, using to_json() method. Having all the cells of the dataframe as dicts, its probably a good approach. To accomplish that I wanted to concat the dataframes like this:

df1:                
 index   A      B
 1     'A1'   'B1'
 2     'A2'   'B2'
 3     'A3'   'B3'

df2:
 index  A      B
 1    'a1'   'b1'
 2    'a2'   'b2'
 3    'a3'   'b3'

Expected solution:

df_sol:
 index    A                    B
 1        {d1:'A1', d2:'a1'}   {d1:'B1', d2:'b1'}
 2        {d1:'A2', d2:'a2'}   {d1:'B2', d2:'b2'}
 3        {d1:'A3', d2:'a3'}   {d1:'B3', d2:'b3'}

the aproach that im using is

pd.Panel(dict(d1=df1, d2=df2)).apply(pd.Series.to_dict, 0)

                              A                         B
index                                                    
1      {'d1': 'A1', 'd2': 'a1'}  {'d1': 'B1', 'd2': 'b1'}
2      {'d1': 'A2', 'd2': 'a2'}  {'d1': 'B2', 'd2': 'b2'}
3      {'d1': 'A3', 'd2': 'a3'}  {'d1': 'B3', 'd2': 'b3'}

but pd.Panel its deprecated DeprecationWarning : Panel is deprecated and will be removed in a future version.its there a workaround using just pandas? thanks!

Original Question


Solution

  • Solution
    pd.concat + other stuff

    pd.Series(
        pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack().to_dict('index')
    ).unstack()
    
                                  A                             B
    1  {'d1': ''A1'', 'd2': ''a1''}  {'d1': ''B1'', 'd2': ''b1''}
    2  {'d1': ''A2'', 'd2': ''a2''}  {'d1': ''B2'', 'd2': ''b2''}
    3  {'d1': ''A3'', 'd2': ''a3''}  {'d1': ''B3'', 'd2': ''b3''}
    

    Explanation
    I want to get [1, 2, 3] and ['A', 'B'] into the index and ['d1', 'd2'] as the columns.

    I start with pd.concat

    pd.concat([df1, df2], axis=1, keys=['d1', 'd2'])
    
             d1          d2      
              A     B     A     B
    index                        
    1      'A1'  'B1'  'a1'  'b1'
    2      'A2'  'B2'  'a2'  'b2'
    3      'A3'  'B3'  'a3'  'b3'
    

    Which almost gets me there. If I follow that with a stack, it will drop the last level of the columns into the last level of the index:

    pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack()
    
               d1    d2
    index              
    1     A  'A1'  'a1'
          B  'B1'  'b1'
    2     A  'A2'  'a2'
          B  'B2'  'b2'
    3     A  'A3'  'a3'
          B  'B3'  'b3'
    

    And this is what I want. From here I can use .to_dict('index')

    pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack().to_dict('index')
    
    {(1, 'A'): {'d1': "'A1'", 'd2': "'a1'"},
     (1, 'B'): {'d1': "'B1'", 'd2': "'b1'"},
     (2, 'A'): {'d1': "'A2'", 'd2': "'a2'"},
     (2, 'B'): {'d1': "'B2'", 'd2': "'b2'"},
     (3, 'A'): {'d1': "'A3'", 'd2': "'a3'"},
     (3, 'B'): {'d1': "'B3'", 'd2': "'b3'"}}
    

    And pass that back to the pd.Series constructor to get a series of dictionaries.

    pd.Series(
        pd.concat([df1, df2], axis=1, keys=['d1', 'd2']).stack().to_dict('index')
    )
    
    1  A    {'d1': ''A1'', 'd2': ''a1''}
       B    {'d1': ''B1'', 'd2': ''b1''}
    2  A    {'d1': ''A2'', 'd2': ''a2''}
       B    {'d1': ''B2'', 'd2': ''b2''}
    3  A    {'d1': ''A3'', 'd2': ''a3''}
       B    {'d1': ''B3'', 'd2': ''b3''}
    dtype: object
    

    The only thing left to do is unstack which I show in the solution above.