I am working in recommendation system. I have followed this to make user by item matrix. However, I faced an error IndexError: index 8928358160 is out of bounds for axis 0 with size 5
The following below is example of datasets.
import pandas as pd
import numpy as np
df = pd.read_csv('APRIL.csv')
df = df.drop(['BASKETID'],1)
df = df.head(10)
df
Out[89]:
MEMBERID SKU QTY
0 8928358161 37101163 2
1 8928358161 36618858 1
2 8928358161 40855129 1
3 8933444371 35010078 1
4 8932505053 36335949 1
5 8932505053 92100668 1
6 8932505053 36529730 2
7 8921161362 61814893 1
8 8915688100 34732853 1
9 8915688100 35122457 1
n_users = df.MEMBERID.unique().shape[0]
n_items = df.SKU.unique().shape[0]
print str(n_users) + ' users'
print str(n_items) + ' items'
5 users
10 items
ratings = np.zeros((n_users, n_items))
for row in df.itertuples():
ratings[row[1]-1, row[2]-1] = row[3]
ratings
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-92-0a393963bf4c> in <module>()
1 ratings = np.zeros((n_users, n_items))
2 for row in df.itertuples():
----> 3 ratings[row[1]-1, row[2]-1] = row[3]
4 ratings
IndexError: index 8928358160 is out of bounds for axis 0 with size 5
I still did not understand from where index 8928358160
come.
Why dont you convert the values to string? Eventhough it's as integer, the computer might take it as a scientific value and thus becoming a float value.
Try this:
mergedfinal['cust_id'] = mergedfinal['cust_id'].astype(str)
mergedfinal['item_number'] = mergedfinal['item_number'].astype(str)
mergedfinal['SKU'] = mergedfinal['SKU'].astype(str)
mergedfinal is my dataframe