Search code examples
dataframedictionarypandas-groupby

generate dict from datarame with grouping columns


I try to generate a json file or dict rom my datframe (grouping the columns)

my datFrame is

    df1 = pd.DataFrame({
        'USER': ['ALL','ALL','BOB','STEVE','PAUL','KEITH','STEVE','STEVE','BOB'],
        'CITY': ['ALL','ALL','PARIS','LONDON','MILAN','MADRID','LONDON','LONDON','PARIS'],
        'TEAMS':['USA','EUROPE','Middle EST','CHINA','JAPAN','MORROCO','Fr','ENGLAN','AUSTRIA'],
        'TASK':['ALL','MANY','ONE','TWO','THREE','FOUR','FIVE','SIX','SEVEN']})

The expected output it should look like this:

exepectdict ={
     [{
     'USER':'ALL',
      'CITY':'ALL',
       'work':
           { 'USA':'ALL',
             'EUROPE':'MANY'
            }
     
     },
      {
     'USER':'BOB',
      'CITY':'PARIS',
       'work':
           { 'Middle EST':'ONE',
             'AUSTRIA':'SEVEN'
            }
     
     },
      {
     'USER':'KEITH',
      'CITY':'MADRID',
       'work':
           { 'MORROCO':'FOUR'
            }
     
     },
           
      {
     'USER':'PAUL',
      'CITY':'MILAN',
       'work':
           { 'JAPAN':'THREE'
            }
     
     },
           
      {
     'USER':'STEVE',
      'CITY':'LONDON',
       'work':
           { 'CHINA':'TWO',
             'Fr':'FIVE',
              'ENGLAN':'SIX'
            }
     
     }
          
    ]}

to do that , I try de group rows (USER?CITY) and generate a list for (TEAMS and TASK columns) :

    df_results=df1.groupby(['USER','CITY'])['TEAMS','TASK'].agg(list)
|USER | CITY    |      TEAMS            |       TASK       |               
|:----|:-------:|:---------------------:|-----------------:|
|ALL  |   ALL   |          [USA, EUROPE]|       [ALL, MANY]|
|BOB  |   PARIS |  [Middle EST, AUSTRIA]|      [ONE, SEVEN]|
|KEITH| MADRID  |              [MORROCO]|            [FOUR]|
|PAUL |  MILAN  |                [JAPAN]|           [THREE]|
|STEVE| LONDON  |    [CHINA, Fr, ENGLAN]|  [TWO, FIVE, SIX]|

but i don't know how generate expected dict Format


Solution

  • Create a column for "work" as it is one to one mapping of TEAMS:TASK during groupby

    df_results = pd.DataFrame(df.groupby(['USER','CITY'])[['TEAMS','TASK']].apply(lambda x:dict(zip(x['TEAMS'],x['TASK']))), columns=['work'])
    df_results.reset_index().to_dict('records')