I run a music website for amateur musicians where we have a rating system based on a score out of 10, which is then calculated into an overall score out of 100. We have a "credibility" points system for users which directly influences the average score at the point of rating, but the next step is to implement a chart system which uses this data effectively.
I'll try and explain exactly how it all works so you can see which data I have at my disposal.
So the data I have to work with is:
In the chart system I want to create a ranking that uses the above 3 sets of data to create a fair balance between quality (overall rating, normalized with number of ratings) and popularity (number of plays). BUT the system should factor quality more heavily than popularity, so for example the quality aspect makes up 75% of the normalized ranking and popularity 25%.
After a search on this site I found the IMDB Bayesian-style system which is helpful for working out the quality aspect, but how do I add in the popularity (number of plays) and have it balanced in the way I want?
The site is written in PHP and MySQL if that helps.
EDIT: the title says "number of clicks" but this is basically the direct equivalent of "number of plays".
You may want to try the following. The IMDB equation you mentioned uses weighing to lean toward either the average rating of the movie or the average rating of all movies:
WR = (v/(v+m)) × R + (m/(v+m)) × C
So
v << m => v/(v+m) -> 0; m/(v+m) -> 1 => WR -> C
and
v >> m => v/(v+m) -> 1; m/(v+m) -> 0 => WR -> R
This should generally be fair. Calculating a popularity score between 0 and 100 based on the number of plays is pretty tricky unless you really know your data. As a first try calculate the average number of plays avg(p) and the variance var(p) you can then use these to scale the number of plays using a technique call whitening:
WHITE(P) = (p - avg(p))/var(p)
This will give you a score between -1 and 1 by assuming your data looks like a bell curve. You can then scale this to be in the range 0 - 100 by scaling again:
POP = 50 * (1 + WHITE(P))
To combine the score based on some weighting factor w (e.g. 0.75) you'd simply do:
RATING = w x WR + (1 - w) x POP
Play with these and let me know how you get on.
NOTE: this does not account for the fact that a use can "game" the popularity buy playing a track many times. You could get around this by penalising multiple plays of a single song:
deltaP = (1 - (Puser - 1)/TPuser) Where:
So the more times a user plays just the one track the less it counts toward the total number of plays for that track. If the users listening habits are diverse then TPuser will be large and so deltaP will tend back to 1. This still can be gamed but is a good start.