Search code examples
pythonexcelpandashierarchy

Expanding Pandas DataFrame based Nested Dictionary Values


In short, I'm looking to expand the group level view into the individual components of those groups based on a mapping schema I've created.

I have two sets of data. I have transactional data in df and a nested dictionary setup for mapping in nested.

import pandas as pd
nested = {"Group A":{"Component 1 Share": 0.25, "Component 2 Share": 0.25, "Component 3 Share": 0.25, "Component 4 Share": 0.25}, 
      "Group B":{"Component 1 Share": 0.5, "Component 5 Share": 0.5}}
data = {'date': ['2018-12-01', '2018-12-01', '2018-12-02', '2018-12-02', '2018-12-02'],
    'groups':['Group A', 'Group B', 'Group A', 'Group B', 'Group A'],
    'sold': [100, 200, 200, 300, 60]}
df = pd.DataFrame(data, columns = ['date', 'groups','sold'])

My goal is to get it into this format at the component level with the nested dictionary. I've simplified both data structures where the real df is much larger and the real nested dictionary has many more elements of various lengths.

goal_data = {'date': ['2018-12-01', '2018-12-01', '2018-12-01', '2018-12-01', 
                  '2018-12-01', '2018-12-01', 
                  '2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02',
                  '2018-12-02', '2018-12-02', 
                  '2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02'],
    'components':["Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
                  "Component 1 Share", "Component 5 Share",
                  "Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
                  "Component 1 Share", "Component 5 Share", 
                  "Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share"],
    'sold': [25, 25, 25, 25,
             100, 100,
             50, 50, 50, 50, 
             150, 150, 
             15,15,15,15]}
component_df = pd.DataFrame(goal_data, columns=["date", "components", "sold"])

I've tried various methods like map, apply, lookup, & merge without luck but intuitively know there's a way to expand out the group level data into the components.


Solution

  • You can do start with nested dict , then do merge by each group

    nestdict_f=pd.DataFrame(nested).stack().reset_index()
    
    newdf=pd.concat([y.merge(nestdict_f,left_on='groups',right_on='level_1')for _,y in df.groupby('date')])    
    
    newdf['sold']=newdf['sold']*newdf[0]
    
    newdf=newdf[['date','level_0','sold']].rename(columns={'level_0':'components'})
    
    newdf
             date         components   sold
    0  2018-12-01  Component 1 Share   25.0
    1  2018-12-01  Component 2 Share   25.0
    2  2018-12-01  Component 3 Share   25.0
    3  2018-12-01  Component 4 Share   25.0
    4  2018-12-01  Component 1 Share  100.0
    5  2018-12-01  Component 5 Share  100.0
    0  2018-12-02  Component 1 Share   50.0
    1  2018-12-02  Component 2 Share   50.0
    2  2018-12-02  Component 3 Share   50.0
    3  2018-12-02  Component 4 Share   50.0
    4  2018-12-02  Component 1 Share   15.0
    5  2018-12-02  Component 2 Share   15.0
    6  2018-12-02  Component 3 Share   15.0
    7  2018-12-02  Component 4 Share   15.0
    8  2018-12-02  Component 1 Share  150.0
    9  2018-12-02  Component 5 Share  150.0