I want to construct train data matrix and test data matrix for book crossing dataset. But the Book Ids which are ISBN code may contain characters. So, I cannot apply this code (from a tutorial):
#Create two user-item matrices, one for training and another for testing
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
train_data_matrix[line[1]-1, line[2]-1] = line[3]
print (line)
test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
test_data_matrix[line[1]-1, line[2]-1] = line[3]
line[2]-1 will cause a type error: unsupported operand type(s) for -: 'str' and 'int'. I need to find another way to build train test matrix. Any other way to build train/test matrix?
Example of printed line when iterating over train_data:
Pandas(Index=874192, user_id=20859, ISBN='3442248876X', rating=0, title='Die Krieger der Drachenlanze 06. Die Ritter des Schwerts.', Location='tübingen, baden-württemberg, germany', Age=0.0)
Note: I thought about creating a new column called book_id which is mapped to book ISBN but containing only integers so that the code works, but I don't know how to do it.
You should encode the ISBN column as it contains a string using, for example, this snippet
isbn_list = list(df.ISBN.unique())
df['ISBN'] = df.ISBN.astype('category', categories=isbn_list).cat.codes
after that numpy should work without problems