Search code examples
pythonmatrixrecommendation-enginetrain-test-split

Train/Test Matrix Book Crossing Recommender System


I want to construct train data matrix and test data matrix for book crossing dataset. But the Book Ids which are ISBN code may contain characters. So, I cannot apply this code (from a tutorial):

#Create two user-item matrices, one for training and another for testing
train_data_matrix = np.zeros((n_users, n_items))
for line in train_data.itertuples():
    train_data_matrix[line[1]-1, line[2]-1] = line[3]  
    print (line)

test_data_matrix = np.zeros((n_users, n_items))
for line in test_data.itertuples():
    test_data_matrix[line[1]-1, line[2]-1] = line[3]

line[2]-1 will cause a type error: unsupported operand type(s) for -: 'str' and 'int'. I need to find another way to build train test matrix. Any other way to build train/test matrix?

Example of printed line when iterating over train_data:

Pandas(Index=874192, user_id=20859, ISBN='3442248876X', rating=0, title='Die Krieger der Drachenlanze 06. Die Ritter des Schwerts.', Location='tübingen, baden-württemberg, germany', Age=0.0)

Note: I thought about creating a new column called book_id which is mapped to book ISBN but containing only integers so that the code works, but I don't know how to do it.


Solution

  • You should encode the ISBN column as it contains a string using, for example, this snippet

    isbn_list = list(df.ISBN.unique())
    df['ISBN'] = df.ISBN.astype('category', categories=isbn_list).cat.codes
    

    after that numpy should work without problems