algorithm machine-learning recommendation-engine collaborative-filtering

evaluating the performance of item-based collaborative filtering for binary (yes/no) product recommendations

I'm attempting to write some code for item based collaborative filtering for product recommendations. The input has buyers as rows and products as columns, with a simple 0/1 flag to indicate whether or not a buyer has bought an item. The output is a list similar items for a given purchased, ranked by cosine similarities.

I am attempting to measure the accuracy of a few different implementations, but I am not sure of the best approach. Most of the literature I find mentions using some form of mean square error, but this really seems more applicable when your collaborative filtering algorithm predicts a rating (e.g. 4 out of 5 stars) instead of recommending which items a user will purchase.

One approach I was considering was as follows...

split data into training/holdout sets, train on training data
For each item (A) in the set, select data from the holdout set where users bought A
Determine which percentage of A-buyers bought one of the top 3 recommendations for A-buyers

The above seems kind of arbitrary, but I think it could be useful for comparing two different algorithms when trained on the same data.

Solution

Actually your approach is quiet similar with the literature but I think you should consider to use recall and precision as most of the papers do.

http://en.wikipedia.org/wiki/Precision_and_recall

Moreover if you will use Apache Mahout there is an implementation for recall and precision in this class; GenericRecommenderIRStatsEvaluator