Search code examples
python-3.xpandasdataframejson-normalize

Aggregate a column of dict into a list of dict with pivot_table - Pandas


I have a dataframe input like this :

data = {'annee': [2015, 2015],
        'medaille': ['Argent', 'Bronze'],
        'categorie': ['Punchs', 'Punchs'],
        'raisonSociale': ['Héritiers H Clément', 'Héritiers H Clément'],
        'Marque': ['Clément', 'Clément'],
        'Medaille_Rhum': [{'annee': 2015,
                           'medaille': 'Argent',
                           'produit': np.nan,
                           'Rhum_Name': 'PUNCH CAFE CLEMENT',
                           'Age': np.nan,
                           'labels': np.nan},
                          {'annee': 2015,
                           'medaille': 'Bronze',
                           'produit': np.nan,
                           'Rhum_Name': 'PUNCH PINA COLADA CLEMENT',
                           'Age': np.nan,
                           'labels': np.nan}],
        'Or': [0, 0],
        'Argent': [1, 0],
        'Bronze': [0, 1],
        'Score': [2, 1]}

df = pd.DataFrame(data)

Yes it's a dataframe of caribbean's rhums

I want to use pivot_table() to have 'annee' (e.g. year) in columns with score values. I achieve it, but I also want to have the 'Rhum_Name' column aggregate in a list of dict.

I tried the following code :

def ListDict(df):
    listDict = []
    listDict.extend(df)
    return listDict


df_ranking = df_ranking.pivot_table(index=['raisonSociale', 'Marque', 'categorie'],
                              columns =['annee', 'Medaille_Rhum'],
                              values = 'Score',
                              fill_value = '',
                              aggfunc = {'annee': sum, 'Medaille_Rhum': ListDict},
                              margins = True, 
                              margins_name = 'Total')

I get this error TypeError: unhashable type: 'dict'. I asume it's because, pivot_table wants to iterate directly over the dict, but I can't picture how to make it work.

Thanks in advance guys !


Solution

    • Convert the dict into a dataframe
    • Convert 'Medaille_Rhum', which is a column of dicts, into a dataframe and join it back to df.
    • Don't try to reshape Medaille_Rhum in the pivot_table

    form the dataframe

    import pandas as pd
    import numpy as np
    
    # create dataframe
    df = pd.DataFrame(data)
    
    # convert the dict in medaille_Rhum to a dataframe and join it back to df
    df = df.join(pd.DataFrame(df.pop('Medaille_Rhum').values.tolist())['Rhum_Name'])
    
    # display(df)
       annee medaille categorie        raisonSociale   Marque  Or  Argent  Bronze  Score                  Rhum_Name
    0   2015   Argent    Punchs  Héritiers H Clément  Clément   0       1       0      2         PUNCH CAFE CLEMENT
    1   2015   Bronze    Punchs  Héritiers H Clément  Clément   0       0       1      1  PUNCH PINA COLADA CLEMENT
    

    create list of Medaille_Rhum

    medaille_rhum = [v for v in data['Medaille_Rhum']]
    
    # print(medaille_rhum)
    [{'annee': 2015,
      'medaille': 'Argent',
      'produit': nan,
      'Rhum_Name': 'PUNCH CAFE CLEMENT',
      'Age': nan,
      'labels': nan},
     {'annee': 2015,
      'medaille': 'Bronze',
      'produit': nan,
      'Rhum_Name': 'PUNCH PINA COLADA CLEMENT',
      'Age': nan,
      'labels': nan}]