algorithm machine-learning recommendation-engine collaborative-filtering

How to manage multiple positive implicit feedbacks?

When there are no ratings, a common scenario is to use implicit feedback (items bought, pageviews, clicks, ...) to suggests recommendations. I'm using a model-based approach and I wondering how to deal with multiple identical feedback.

As an example, let's imagine that consummers buy items more than once. Should I have to consider the number of feedback (pageviews, items bought, ...) as a rating or compute a custom value ?

Solution

To model implicit feedback, we usually have a mapping procedure to map implicit user feedback into the explicit ratings. I guess in most domains, repeated user action against the same item indicates that the user's preference over the item is increasing. This is certainly true if the domain is music or video recommendation. In a shopping site, such a behavior might indicate the item is consumed periodically, e.g., diapers or printer ink.

One way I am aware of to model this multiple implicit feedback is to create a numeric rating mapping function. When the number of times (k) of implicit feedback increases, the mapped value of rating should increase. At k = 1, you have a minimal rating of positive feedback, for example 0.6; when k increases, it approaches 1. For sure, you don't need to map to [0,1]; you can have integer ratings, 0,1,2,3,4,5.

To give you a concrete example of the mapping, here is what they did in a music recommendation domain. For short, they used the statistic info of the items per user to define the mapping function.

We assume that the more times the user has listened to an artist the more the user likes that particular artist. Note that user’s listening habits usually present a power law distribution, meaning that a few artists have lots of plays in the users profile, while the rest of the artists have significantly less play counts. Therefore, we compute the complementary cumulative distribution of artist plays in the users’ profile. Artists located in the top 80-100% of the distribution are assigned a score of 5, while artists in the 60-80% range assign a score of 4.

Another way I have seen in the literature is to create another variable besides a binary rating variable. They call it confidence levels. See here for details.