pythonpandaspivot

# Count unique value for each group and subtotal

I have a simple dataframe as below:

``````import pandas as pd
import numpy as np
df = pd.DataFrame({'BR_NM': ['HN', 'HN', 'HP'],
'CUS_ID': ['12345', '12345', '12345'],
'ACC_ID': ['12345_1', '12345_2', '12345_3'],
'REGION': ['North', 'North', 'North'],
'CUS_TYPE': ['Individual', 'Individual', 'Individual']})
df
BR_NM   CUS_ID  ACC_ID  REGION  CUS_TYPE
HN      12345   12345_1 North   Individual
HN      12345   12345_2 North   Individual
HP      12345   12345_3 North   Individual
``````

I want to count unique `CUS_ID` based on `BR_NM` then sum it based on `REGION`. In my case, it's just one customer with three account but I want to count it as two customer. Below is my desired Ouput:

``````REGION  CUS_TYPE    North
0      Individual   2
``````

If I used `pivot_table` and `aggfunc = pd.Series.nunique` it just count as 1.

``````df2 = pd.pivot_table(df, values='CUS_ID', columns='REGION', index='CUS_TYPE', aggfunc=pd.Series.nunique).reset_index()
``````

Thank you.

Solution

• it's just one customer with three account but I want to count it as two customer.

Use `drop_duplicates` based on `BR_NM` and `CUS_ID` columns to get unique records:

``````>>> (df.drop_duplicates(['BR_NM', 'CUS_ID'])
.value_counts(['REGION', 'CUS_TYPE'])
.unstack('REGION').reset_index())

REGION    CUS_TYPE  North
0       Individual      2
``````

About `drop_duplicates`:

``````>>> df.drop_duplicates(['BR_NM', 'CUS_ID'])

BR_NM CUS_ID   ACC_ID REGION    CUS_TYPE
0    HN  12345  12345_1  North  Individual
2    HP  12345  12345_3  North  Individual
``````