Search code examples
pythonpandassplit-apply-combine

Python Pandas Aggregate Series Data Within a DataFrame


Within a dataframe, I am trying split-apply-combine to a column which contains series data element-wise. (I've searched SO but haven't found anything pertaining to series within data frames.)

The data frame:

import pandas as pd
from pandas import Series, DataFrame

import numpy as np

ex = {'account': [1, 1, 1, 2, 2],
      'subaccount': [1, 2, 3, 1, 2],
      'account_type':  ['A', 'A', 'B', 'A', 'B'],
      'data': [(1, 2, 3), (4, 5, 6), (7, 8, 9), (1, 3, 5), (2, 4, 6)]}

df = DataFrame(ex, columns=['account', 'subaccount', 'account_type', 'data'])

Then I groupby and aggregate, like so.

result = (df.groupby(['account', 'account_type'])
           .agg({'subaccount': np.sum}))

This gives me

                       subaccount
account  account_type
1        A             3
         B             3
2        A             1
         B             2

but what I want is

                      subaccount
account  account_type
1        A            (5, 7, 9)
         B            (7, 8, 9)
2        A            (1, 3, 5)
         B            (2, 4, 6)

I'm probably missing something obvious, but the solution escapes me.


Solution

  • This works

    result = df.groupby(['account', 'account_type'])\
           .apply(lambda x : [sum(y) for y in zip(*x["data"])])
    

    However it may be slow for a large dataset