Search code examples
pythonpandaspivot

Count unique value for each group and subtotal


I have a simple dataframe as below:

import pandas as pd
import numpy as np
df = pd.DataFrame({'BR_NM': ['HN', 'HN', 'HP'], 
                   'CUS_ID': ['12345', '12345', '12345'],
                   'ACC_ID': ['12345_1', '12345_2', '12345_3'],
                   'REGION': ['North', 'North', 'North'], 
                   'CUS_TYPE': ['Individual', 'Individual', 'Individual']})
df
BR_NM   CUS_ID  ACC_ID  REGION  CUS_TYPE
HN      12345   12345_1 North   Individual
HN      12345   12345_2 North   Individual
HP      12345   12345_3 North   Individual

I want to count unique CUS_ID based on BR_NM then sum it based on REGION. In my case, it's just one customer with three account but I want to count it as two customer. Below is my desired Ouput:

REGION  CUS_TYPE    North
0      Individual   2

If I used pivot_table and aggfunc = pd.Series.nunique it just count as 1.

df2 = pd.pivot_table(df, values='CUS_ID', columns='REGION', index='CUS_TYPE', aggfunc=pd.Series.nunique).reset_index()

Thank you.


Solution

  • it's just one customer with three account but I want to count it as two customer.

    Use drop_duplicates based on BR_NM and CUS_ID columns to get unique records:

    >>> (df.drop_duplicates(['BR_NM', 'CUS_ID'])
           .value_counts(['REGION', 'CUS_TYPE'])
           .unstack('REGION').reset_index())
    
    REGION    CUS_TYPE  North
    0       Individual      2
    

    About drop_duplicates:

    >>> df.drop_duplicates(['BR_NM', 'CUS_ID'])
    
      BR_NM CUS_ID   ACC_ID REGION    CUS_TYPE
    0    HN  12345  12345_1  North  Individual
    2    HP  12345  12345_3  North  Individual