I have a dataframe that looks like below
customer_id,month,Group,category,days_ago
A1,Jan,Premium,saf,13
A1,Jan,Premium,ewf,54
A2,Jan,Lost,ds,32
A3,Jan,Lost,dfs,78
A4,Jan,Lost,sdfg,94
A5,Jan,Loyal,sa,14
A6,Jan,Need Attention,ewf,13
A1,Mar,Premium,efWCC,78
A2,Mar,Need Attention,POI
A3,Mar,Lost,QWE
A4,Mar,Need Attention,QOEP
A4,Mar,Need Attention,POTU
A5,Mar,Loyal,FANC
A6,Mar,Lost,FAS
A7,Mar,New,qewr
A8,Mar,New,wqer
t1 = pd.read_clipboard(sep=',')
I would like to do the below
a) Create a matrix against Jan and Mar month
b) Fill the matrix with customer count under each group
I expect my output to be in a table like as below
I tried the below but not sure how to get everything in a neat table
cust_info = t1.groupby(['customer_id','month','Group']).size().reset_index()
group_info = t1.groupby(['customer_id','Group']).size().reset_index()
group_info.merge(cust_info,on='customer_id',how='left')
Is there anyway to capture their movement from one group another between the months Jan
and Mar
? I have a big data of 20K customers. Is there any elegant way to produce the below output?
Use merge
of filtered DataFrames with crosstab
:
df = df[df['month'].eq('Jan')].merge(df[df['month'].eq('Mar')], on='customer_id')
df = pd.crosstab([df['month_x'], df['Group_x']], [df['month_y'], df['Group_y']])
print (df)
month_y Mar
Group_y Lost Loyal Need Attention Premium
month_x Group_x
Jan Lost 1 0 2 0
Loyal 0 1 0 0
Need Attention 1 0 0 0
Premium 0 0 0 1
IIUC use:
df = df.drop_duplicates(['month','customer_id','customer_id', 'Group'])
df = df[df['month'].eq('Jan')].merge(df[df['month'].eq('Mar')],
on='customer_id',
how='outer').fillna('no match')
df = pd.crosstab([df['month_x'], df['Group_x']], [df['month_y'], df['Group_y']])
print (df)
month_y Mar
Group_y Lost Loyal Need Attention New Premium
month_x Group_x
Jan Lost 1 0 2 0 0
Loyal 0 1 0 0 0
Need Attention 1 0 0 0 0
Premium 0 0 0 0 1
no match no match 0 0 0 2 0