I was recently wondering how http://500px.com calculates their "Pulse" rating. The "Pulse" is a score from 1..100 based on the popularity of the photo.
I think it might use some of the following criteria:
How would I achieve some sort of algorithm like this?
Any advice on how to implement an algorithm with this criteria (and maybe some code) would be appreciated too.
I don't know too much about the site but systems like this generally work the same way. Normalize a set of weighted values to produce a single comparable value.
Define your list of rules, weight them based on importance, then run them all together to get your final value.
In this case it would be something like.
Now we need to normalize the values for those rules. Depending on what your data is, scale of numbers etc this will be different for each rule so we need a workable value, say between 1 and 100.
Example normalizations for the above:
= percentage of vistors out of 50,000 vists (good number of vists)
(vists / 50000 ) * 100
= percentage of likes out of 10,000 likes (good number of likes)
(likes / 10000) * 100
= percentage of vistors that liked it
(likes / vists) * 100
= percentage of likes in last 30 days out of 1,000 likes (good number of likes for a 30 day period)
(likesIn30Days / 1000) * 100
= arbitrary rating of the author
Make sure all of these have a maximum value of 100 (if it's over bring it back down). Then we need to combine all these up depending on their weighting:
Popularity = (1 * 0.1) + (2 * 0.1) + (3 * 0.4) + (4 * 0.2) + (5 * 0.2)
This is all off the top of my head and rough. There are obviously also much more efficient ways to this as you don't need to normalize to a percentage at every stage but I hope it helps you get the gist.
Update
I've not really got any references or extra reading. I've never really worked with it as a larger concept only in small implementations.
I think most of what you read though is going to be methodological ranking systems in general and theories. Because depending on your rules and data format, your implementation will be very different. It seems such a huge concept when actually it will probably come down to arround 10 lines of code, not counting aggregating your data.