python-3.x recommendation-engine lightfm

Lightfm Incorrect number of features in user_features

I am building recommender system - hybrid in Lightfm. My data has 39326 unique users and 2569 unique game titles(items). My train interaction sparce matrix has shape: <39326x2569 sparse matrix of type '<class 'numpy.float64'>' with 758931 stored elements in Compressed Sparse Row format> My test interaction sparce matrix has shape:<39323x2569 sparse matrix of type '<class 'numpy.float64'>' with 194622 stored elements in Compressed Sparse Row format>

I train model: model1 = LightFM(learning_rate=0.01, loss='warp') model1.fit(train_interactions,
epochs=20) which creates object: <lightfm.lightfm.LightFM at 0x1bf8c8dc4c8> But when I try to check accuracy by: train_precision = precision_at_k(model1, train_interactions, k=10).mean() test_precision = precision_at_k(model1, test_interactions, k=10).mean()

I get error message: Incorrect number of features in user_features WHY??? Clearly the shapes are compatible? What am I missing?

Solution

Your test sparse matrix is of dimension 39323x2569 while your train sparse matrix is of dimension 39326x2569. You are missing 3 users in your test set.

I suggest you use the lightfm built-in train/test split function to avoid errors : https://making.lyst.com/lightfm/docs/cross_validation.html

If you want to split your data in your own way, you can also transform your user_id and item_id to consecutive integers starting from 0. And then use this :

from lightfm.data import Dataset
# Create your train and test set in the format [[user_id1, item_id1, score1], ..., [user_idn, item_idn, scoren]] 
# Your score can be just 1 for an implicit interaction
# user_id and item_id are integers

data = Dataset()
data.fit(unique_user_ids, # list from 0 to n_users 
         unique_item_ids # list from 0 to n_items
        )
train, weights_matrix = data.build_interactions([tuple(i) for i in train])
test, weights_matrix = data.build_interactions([tuple(i) for i in test])