Search code examples
pythonpandasgroup-bycountaggregate

How to aggregate DataFrame to count valus per ID in also with other columns in output Python Pandas?


I have DataFrame in Python Pandas like below:

ID  | COL1 | ... | COL_n
----|------|-----|-------
123 | P    | ... | ...
123 | P    | ... | ...
123 | M    | ... | ...
444 | BM   | ... | ...
567 | BM   | ... | ...
29  | P    | ... | ...

And I need to aggregate above columns so as to have output like below:

  • How many time client bought P -> col: X_P

  • How many time client bought M -> col: X_M

  • How many time client bought in total M + P -> col: X_PM

  • in my real dataset I have many more columns and I need to also have these columns in output

    ID COL1 ... COL_n X_P X_M X_PM
    123 P ... ... 2 1 3
    123 P ... ... 2 1 3
    123 M ... ... 2 1 3
    444 BM ... ... 0 0 0
    567 BM ... ... 0 0 0
    29 P ... ... 1 0 1

How can I do that in Python Pandas ?


Solution

  • You could do two steps with pd.crosstab

    s = (pd.crosstab(df['ID'],df['COL1'])[['M','P']].
            assign(MP=lambda x : x.sum(1)).
            reindex(df['ID']).
            add_prefix('X_'))
    s.index = df.index
    df = df.join(s)
    Out[178]: 
        ID COL1  X_M  X_P  X_MP
    0  123    P    1    2     3
    1  123    P    1    2     3
    2  123    M    1    2     3
    3  444   BM    0    0     0
    4  567   BM    0    0     0
    5   29    P    0    1     1