Search code examples
phpmysqlmathrating

Conceptual help on implementing a rating system that lets items decrease in time


I am running a website that lets users contribute by letting them upload files on specific subjects. Right now my rating system is the worst possible (number of downloads of the file). Not only is this highly inaccurate in terms of quality control but also does it prevent new content to become listed on top anytime soon. This is why I want to change my rating system so that users can up-/down-vote each item. However this should not be the only factor to display the popularity of such item. I would like to have older content to decrease in rating over time. Maybe I could even factor in the amount of downloads but to a very low percentage.

So, my questions are:

  1. Which formula would you suggest under the assumption that there is 1 new upload every day?
  2. How would you implement this in a php/mysql environment?

My problem is that right now I am simply sorting my stuff by the downloads row in the database. How can I sort a query by a factor that is calculated externally (in php) or do I have to update a new row in my table with the rating factor each time someone calls the site in his browser?

(Please excuse any mistakes, I am not a native speaker)


Solution

  • First of all, in any case, you will need to add at least one column to your table. The best thing would be to have a separate table with id, upvotes, downvotes, datetime

    If you want to take in consideration the freshness of posts (or uploads or comments or...) I think the best actual method is Wilson score with a gravity parameter.

    For a good start with Wilson score implementation in PHP, check this.

    Then you will need to read this to understand the pros and the cons of other solutions and use SQL directly.
    Remark: gravity is not explicitly detailed in the SQL code but thanks to the PHP one you should be able to make it work.

    Note that if you would like something simpler but still not lame, you could check with Bayesian Average. IMDB uses Bayesian Estimation to calculate its Top 250.

    Implementing your own statistical model will only results in drawbacks that you had not imagined first (too far from the mean, downvotes are more important than upvotes, decay too quickly, etc...)

    Finally you are talking about rating uploads directly, not the user who uploads the files. If you would like to do the same with the user, the simpler would be to use a Bayesian estimate with the results from your uploads ratings.

    You have a lot to read, just in StackOverflow, to dry the subject.

    Your journey starts here...