Search code examples
pythonpandasrowpivot-tablecol

Python summarize row into column (Pandas pivot table)


I have a list of persons with the respective earnings by company like this

Company_code Person Date    Earning1 Earning2
1            Jonh   2014-01 100      200
2            Jonh   2014-01 300      400
1            Jonh   2014-02 500      600
1            Peter  2014-01 300      400
1            Peter  2014-02 500      600

And I would like to summarize into this:

Company_code Person 2014-01_E1 2014-01_E2 2014-02_E1 2014-02_E2
1            Jonh   100        200        300        400
2            Jonh   500        600
1            Peter  300        400        500        600

I had the same problem doing this with SQL which I solved with the code:

with t(Company_code, Person, Dt, Earning1, Earning2) as (
  select 1, 'Jonh', to_date('2014-01-01', 'YYYY-MM-DD'), 100, 200 from dual union all
  select 2, 'Jonh', to_date('2014-01-01', 'YYYY-MM-DD'), 300, 400 from dual union all
  select 1, 'Jonh', to_date('2014-02-01', 'YYYY-MM-DD'), 500, 600 from dual union all
  select 1, 'Peter', to_date('2014-01-01', 'YYYY-MM-DD'), 300, 400 from dual union all
  select 1, 'Peter', to_date('2014-02-01', 'YYYY-MM-DD'), 500, 600 from dual
)
select * 
  from t
 pivot ( 
     sum(Earning1) e1
   , sum(Earning2) e2 
 for dt in (
     to_date('2014-01-01', 'YYYY-MM-DD') "2014-01"
   , to_date('2014-02-01', 'YYYY-MM-DD') "2014-02"
 )
)

COMPANY_CODE    PERSON  2014-01_E1  2014-01_E2  2014-02_E1  2014-02_E2
----------------------------------------------------------------------
           2    Jonh           300         400           -           -
           1    Peter          300         400         500         600
           1    Jonh           100         200         500         600

How can this be achived in python? I'm trying with Pandas pivot_table:

pd.pivot_table(df, columns=['COMPANY_CODE', 'PERSON', 'DATE'], aggfunc=np.sum)

but this just transposes the table ... any clues?


Solution

  • Using user1827356's suggestion:

    df2 = pd.pivot_table(df, rows=['Company_code', 'Person'], cols=['Date'], aggfunc='sum')
    print(df2)
    #                      Earning1           Earning2         
    # Date                  2014-01  2014-02   2014-01  2014-02
    # Company_code Person                                      
    # 1            Jonh         100      500       200      600
    #              Peter        300      500       400      600
    # 2            Jonh         300      NaN       400      NaN
    

    You can flatten the hierarchical columns like this:

    columns = ['{}_E{}'.format(date, earning.replace('Earning', ''))
               for earning, date in df2.columns.tolist()]
    df2.columns = columns
    print(df2)
    #                      2014-01_E1  2014-02_E1  2014-01_E2  2014-02_E2
    # Company_code Person                                                
    # 1            Jonh           100         500         200         600
    #              Peter          300         500         400         600
    # 2            Jonh           300         NaN         400         NaN