Search code examples
redditdata-science

Evaluation metrics for algoritthms to calculate topic-hotness


How do you evaluate algorithms for calculation of hotness of a post? As in how would you know, which performs better an exponential-decay or the redddit's algo? I understand the question may be a bit naive, but I am looking into performance metrics, or cost functions to help with this?


Solution

  • As with evaluation of any piece of software, you have to first set out problems for it solve and from those derive goals you want to achieve. After you have those, then you can start to determine what metrics will provide a useful approximation of progress towards the goals.

    Perhaps you want your site to be great at breaking news. You probably will derive goals from that like "given sufficient votes, a new post should be able to make it to the top 30 listings in the first 10 minutes after it's posted". Then you can build out some test cases and see if you meet them.

    Or perhaps you want to be the place with the "best" stuff from across the web. Your goals will weight more heavily towards user approval than newness.

    You have to evaluate your own situation to come up with reasonable performance metrics.