machine-learning mahout recommendation-engine

Mahout Recommender: What relative preference values are suitable for a GenericUserBasedRecommender?

In mahout, I'm setting up a GenericUserBasedRecommender, pretty straight forward for now, typical settings.

In generating a "preference" value for an item, we have the following 5 data points:

Positive interest

User converted on item (highest possible sign of interest)
Normal like (user expressed interest, e.g. like buttons)
Indirect expression of interest (clicks, cursor movements, measuring "eyeballs")

Negative interest

Indifference (items the user ignored when active on other items, a vague expression of disinterest)
Active dislike (thumbs down, remove item from my view, etc)

Over what range I should express these different attributes, let's use a 1-100 scale for discussion?

Should I be keeping the 'Active dislike' and 'Indifference' clustered close together, for example, at 1 and 5 respectively, with all the likes clustered in the 90-100 range?
Should 'Indifference' and 'Indirect expressions of interest' by closer to the center? As in 'Indifference' in the 20-35 range and 'Indirect like' in the 60-70 range?
Should 'User conversion' blow the scale away and be heads and tails higher than the others? As in: 'User Conversion' @ 100, 'Lesser likes' @ ~65, 'Dislikes' clustered in the 1-10 range?
On the scale of 1-100 is 50 effectively "null", or equivalent to no data point at all?

I know the final answer lies in trial and error and in the meaning of our data, but as far as the algorithm goes, I'm trying to understand at what point I need to tip the scales between interest and disinterest for the algorithm to function properly.

Solution

The actual range does not matter, not for this implementation. 1-100 is OK, 0-1 is OK, etc. The relative values are all that really matters here.

These values are estimated by a simple (linearly) weighted average. Therefore the response ought to be "linear". It ought to match an intuition that if action X gets a score 2x higher than action Y, then X should be an indicator of twice as much interest in real life.

A decent place to start is to simply size them relative to their frequency. If click-to-conversion rate is 2%, you might make a click worth 2% of a conversion.

I would ignore the "Indifference" signal you propose. It is likely going to be too noisy to be of use.