performance machine-learning metrics recommendation-engine

Recommendation System - Recall@K and Precision@K

I am building a recommendation system for my company and have a question about the formula to calculate the precision@K and recall@K which I couldn't find on Google.

With precision@K, the general formula would be the proportion of recommended items in the top-k set that are relevant.

My question is how to define which items are relevant and which are not because a user doesn't necessarily have interactions with all available items but only a small subset of them. What if there is a lack in ground-truth for the top-k recommended items, meaning that the user hasn't interacted with some of them so we don't have the actual rating? Should we ignore them from the calculation or consider them irrelevant items?

The following article suggests to ignore these non-interactions items but I am not really sure about that.

https://medium.com/@m_n_malaeb/recall-and-precision-at-k-for-recommender-systems-618483226c54

Solution

You mention "recommended items" so I'll assume you're talking about calculating precision for a recommender engine, i.e. the number of predictions in the top k that are accurate predictions of the user's future interactions.

The objective of a recommender engine is to model future interactions from past interactions. Such a model is trained on a dataset of interactions such that the last interaction is the target and n past interactions are the features.

The precision would therefore be calculated by running the model on a test set where the ground truth (last interaction) was known, and dividing the number of predictions where the ground truth was within the top k predictions by the total number of test items.

Items that the user has not interacted with do not come up because we are training the model on behaviour of other users.