python metrics recommendation-engine precision-recall

Evaluation Metrics for Recommendation Systems

I am building a collaborative filtering recommendation engine and i am trying to measure how accurate is my model and the quality of the recommendations. I test my algorithm with the following steps.

1) Train the model with 3 months data ( t )

2) I recommend items for the next day (t1 = t+1day )

3) Calculate the Accuracy, Precision and Recall from validation set.

As validation i use a 30 days time-space ( t1 + 30days) to check if the user interact with the product

This is the way that i measure my model now:

Accuracy: How many times a user buy 1 item from my top 5 recommendation

For Precision and Recall i measure the 2 metrics for every user and then i found the mean Precision and Recall of all my users:

Precision at Top 5 recommendations: Correct recommendations / 5

Recall at Top 5 recommendations: Correct recommendations / Known products that the user buy in 30 days validation

Is the way that i measure the Recall correct?

What the Recall represent in recommendation engines?

Is there any other metrics that i can use?

Solution

Recall adopted for recommendation systems measures the ratio of products that were actually bought by customers (hits) in relation to their number of products in the test set (|T|).

$R@N = \frac{hits}{|T|}$

This measure is first calculated for each test customer and then averaged for all users in the test set. More information about the basic idea can be found in the paper of Cremonesi et al. (2010) “Performance of Recommender Algorithms on Top-n Recommendation Tasks” or in an early paper of Herlocker et al. (2004) “Evaluating collaborative filteringrecommender systems”

Other metrics which may be suitable can take both into account, precision and recall. E.g., F1-Score is the harmonic mean of both measures and can be calculated by

$F1@N = 2~\cdot~\frac{precision@N~\cdot~recall@N}{precision@N~+~recall@N}$

However, some studies have shown that customers usually look at the recommendation lists from top to bottom, often only perceiving the few products at the top of the list. To overcome this issue ranking-based measure e.g., Mean Average Precision (MAP) can be used.