Search code examples
machine-learningrecommendation-enginesvdcollaborative-filtering

How to split train/test of extreme sparse dataset of recommender system?


I'm using CF algorithm(SVD) on a real world data set. Now I meet a problem about the data sparse problem. That means the sparsity of the user/item rating matrix is around 0.01%. I split the data into train/test set with 80/20, I find that there're just a few users and items in testing set appear in the training set, so I can just use a few rating in testing set to calculate RMSE. Would you give me some advise to fix it?


Solution

  • In case of recommender systems one usually splits each user's history into train and test. More detailed:

    1. For each user we write out items he interacted with.
    2. Preferably, we order them by (incresing) time to overcome "time-traveling issue" (user can revisit already known items, so you don't want to test on early dataset).
    3. As usual, you use first (1-k) percents of your dataset as a train set and the rest as a test set.