Search code examples
pythonpandasdataframematrixgroup-by

pandas - create customer movement matrix


I have a dataframe that looks like below

customer_id,month,Group,category,days_ago
A1,Jan,Premium,saf,13
A1,Jan,Premium,ewf,54
A2,Jan,Lost,ds,32
A3,Jan,Lost,dfs,78
A4,Jan,Lost,sdfg,94
A5,Jan,Loyal,sa,14
A6,Jan,Need Attention,ewf,13

A1,Mar,Premium,efWCC,78
A2,Mar,Need Attention,POI
A3,Mar,Lost,QWE
A4,Mar,Need Attention,QOEP
A4,Mar,Need Attention,POTU
A5,Mar,Loyal,FANC
A6,Mar,Lost,FAS
A7,Mar,New,qewr
A8,Mar,New,wqer

t1 = pd.read_clipboard(sep=',')

I would like to do the below

a) Create a matrix against Jan and Mar month

b) Fill the matrix with customer count under each group

I expect my output to be in a table like as below

enter image description here

I tried the below but not sure how to get everything in a neat table

cust_info = t1.groupby(['customer_id','month','Group']).size().reset_index()
group_info = t1.groupby(['customer_id','Group']).size().reset_index()
group_info.merge(cust_info,on='customer_id',how='left')

Is there anyway to capture their movement from one group another between the months Jan and Mar? I have a big data of 20K customers. Is there any elegant way to produce the below output?

enter image description here


Solution

  • Use merge of filtered DataFrames with crosstab:

    df = df[df['month'].eq('Jan')].merge(df[df['month'].eq('Mar')], on='customer_id')
    
    df = pd.crosstab([df['month_x'], df['Group_x']], [df['month_y'], df['Group_y']])
    print (df)
    month_y                 Mar                             
    Group_y                Lost Loyal Need Attention Premium
    month_x Group_x                                         
    Jan     Lost              1     0              2       0
            Loyal             0     1              0       0
            Need Attention    1     0              0       0
            Premium           0     0              0       1
    

    IIUC use:

    df = df.drop_duplicates(['month','customer_id','customer_id', 'Group'])
    
    df = df[df['month'].eq('Jan')].merge(df[df['month'].eq('Mar')], 
                                         on='customer_id', 
                                         how='outer').fillna('no match')
    
    df = pd.crosstab([df['month_x'], df['Group_x']], [df['month_y'], df['Group_y']])
    print (df)
    month_y                  Mar                                 
    Group_y                 Lost Loyal Need Attention New Premium
    month_x  Group_x                                             
    Jan      Lost              1     0              2   0       0
             Loyal             0     1              0   0       0
             Need Attention    1     0              0   0       0
             Premium           0     0              0   0       1
    no match no match          0     0              0   2       0