I have a simple dataframe as below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'BR_NM': ['HN', 'HN', 'HP'],
'CUS_ID': ['12345', '12345', '12345'],
'ACC_ID': ['12345_1', '12345_2', '12345_3'],
'REGION': ['North', 'North', 'North'],
'CUS_TYPE': ['Individual', 'Individual', 'Individual']})
df
BR_NM CUS_ID ACC_ID REGION CUS_TYPE
HN 12345 12345_1 North Individual
HN 12345 12345_2 North Individual
HP 12345 12345_3 North Individual
I want to count unique CUS_ID
based on BR_NM
then sum it based on REGION
. In my case, it's just one customer with three account but I want to count it as two customer. Below is my desired Ouput:
REGION CUS_TYPE North
0 Individual 2
If I used pivot_table
and aggfunc = pd.Series.nunique
it just count as 1.
df2 = pd.pivot_table(df, values='CUS_ID', columns='REGION', index='CUS_TYPE', aggfunc=pd.Series.nunique).reset_index()
Thank you.
it's just one customer with three account but I want to count it as two customer.
Use drop_duplicates
based on BR_NM
and CUS_ID
columns to get unique records:
>>> (df.drop_duplicates(['BR_NM', 'CUS_ID'])
.value_counts(['REGION', 'CUS_TYPE'])
.unstack('REGION').reset_index())
REGION CUS_TYPE North
0 Individual 2
About drop_duplicates
:
>>> df.drop_duplicates(['BR_NM', 'CUS_ID'])
BR_NM CUS_ID ACC_ID REGION CUS_TYPE
0 HN 12345 12345_1 North Individual
2 HP 12345 12345_3 North Individual