Search code examples
pandasrecommendation-engine

user by item martrix pandas


I am working in recommendation system. I have followed this to make user by item matrix. However, I faced an error IndexError: index 8928358160 is out of bounds for axis 0 with size 5

The following below is example of datasets.

import pandas as pd
import numpy as np

df = pd.read_csv('APRIL.csv')
df = df.drop(['BASKETID'],1)
df = df.head(10)
df
Out[89]:
MEMBERID    SKU QTY
0   8928358161  37101163    2
1   8928358161  36618858    1
2   8928358161  40855129    1
3   8933444371  35010078    1
4   8932505053  36335949    1
5   8932505053  92100668    1
6   8932505053  36529730    2
7   8921161362  61814893    1
8   8915688100  34732853    1
9   8915688100  35122457    1


n_users = df.MEMBERID.unique().shape[0]
n_items = df.SKU.unique().shape[0]
print str(n_users) + ' users'
print str(n_items) + ' items'
5 users
10 items

ratings = np.zeros((n_users, n_items))
for row in df.itertuples():
    ratings[row[1]-1, row[2]-1] = row[3]
ratings
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-92-0a393963bf4c> in <module>()
      1 ratings = np.zeros((n_users, n_items))
      2 for row in df.itertuples():
----> 3     ratings[row[1]-1, row[2]-1] = row[3]
      4 ratings

IndexError: index 8928358160 is out of bounds for axis 0 with size 5

I still did not understand from where index 8928358160 come.


Solution

  • Why dont you convert the values to string? Eventhough it's as integer, the computer might take it as a scientific value and thus becoming a float value.

    Try this:

    Converting the cust_id and item_number into characters from float value:

    mergedfinal['cust_id'] = mergedfinal['cust_id'].astype(str)
    mergedfinal['item_number'] = mergedfinal['item_number'].astype(str)
    mergedfinal['SKU'] = mergedfinal['SKU'].astype(str)
    

    mergedfinal is my dataframe