Search code examples
pythonpandascollaborative-filtering

How to change this dataframe with python in order to use collaborative filtering


Here is my original data: enter image description here

As you can see.The cust_id column records the consumption record for each ID.And second column means the product name,third is the munber they bought each time.

I want to get this kind of data: enter image description here

The result data shows each customer bought which product and how many.If they never bought,then the data is None.I think this is Sparse matrix.

I have tried many ways and still can't fix it up.....

Maybe pandas?Numpy?


Solution

  • There is problem with duplicates, I add last row with same cust_id and prd_id values for demonstrate it.

    print (df)
        cust_id  prd_id  prd_number
    8       462      40           1
    9       462      46           3
    10      462      59           1
    11      462      63          13
    12      462      67           1
    13      462      82          12
    14      462      88           1
    15      462     163           3
    16      463      68           1
    17      463      90           1
    18      463     159           2
    16      464      93          11
    20      464      94           8
    21      464      96           1
    22      464     142           4
    23      465      50           1
    24      465      50           5
    

    Then need groupby by columns cust_id and prd_id with aggreagting some function like mean() or sum(). Last unstack with replacing NaN to 0:

    print (df.groupby(['cust_id', 'prd_id'])['prd_number'].sum().unstack(fill_value=0))
    prd_id   40   46   50   59   63   67   68   82   88   90   93   94   96   142  \
    cust_id                                                                         
    462        1    3    0    1   13    1    0   12    1    0    0    0    0    0   
    463        0    0    0    0    0    0    1    0    0    1    0    0    0    0   
    464        0    0    0    0    0    0    0    0    0    0   11    8    1    4   
    465        0    0    6    0    0    0    0    0    0    0    0    0    0    0   
    
    prd_id   159  163  
    cust_id            
    462        0    3  
    463        2    0  
    464        0    0  
    465        0    0