Search code examples
phpalgorithmsortingranking

Item rankings, order by confidence using Reddit Ranking Algorithms


I am interested to use this ranking class, based off of an article by Evan Miller to rank a table I have that has upvotes and downvotes. I have a system very similar to Stack Overflow's up/down voting system for an events site I am working on, and by using this ranking class I feel as though results will be more accurate. My question is how do I order by the function 'hotness'?

private function _hotness($upvotes = 0, $downvotes = 0, $posted = 0) {
    $s = $this->_score($upvotes, $downvotes);
    $order = log(max(abs($s), 1), 10);

    if($s > 0) {
        $sign = 1;
    } elseif($s < 0) {
        $sign = -1;
    } else {
        $sign = 0;
    }

    $seconds = $posted - 1134028003;

    return round($order + (($sign * $seconds)/45000), 7);
}

I suppose each time a user votes I could have a column in my table that has the hotness data recalculated for the new vote, and order by that column on the main page. But I am interested to do this more on-the-fly incorporating the function above, and I am not sure if that is possible.

From Evan Miller, he uses:

SELECT widget_id, ((positive + 1.9208) / (positive + negative) - 
                   1.96 * SQRT((positive * negative) / (positive + negative) + 0.9604) / 
                          (positive + negative)) / (1 + 3.8416 / (positive + negative)) 
       AS ci_lower_bound FROM widgets WHERE positive + negative > 0 
       ORDER BY ci_lower_bound DESC;

But I rather not do this calculation in the sql as I feel this is messy and difficult to change down the line if I utilize this code on multiple pages .etc.


Solution

  • You are right, query like this is rather messy and expensive as well.

    Mixed PHP/MySQL on the fly is a bad idea well as you will have to select values for all posts and calculate hotness and then select a list of hotest ones. Extremely expensive.

    You should consider saving at least part of your calculation to the database. Definitely order should go to the database. It's always better to calculate something and save just once on every save/update, instead of calculating each time it will be displayed. Try to do a benchmark on how much time you will save by calculating order on save/update instead of every time you calculate the hotness. Good thing is that order never changes unless someone upvotes/downvotes which you save to the db anyway, same for the sign.

    Even if you save the sign to the db you are stil not able to avoid calculating on the fly due to the posted timestamp parameter.

    I would see what difference does it make and where it makes a difference and calculate hotness with a CLI script every x amount of time only for those scripts where this is crucial, every y amount of time where it's making less of a difference.

    Taking this approach you will be recalculating hotness only when necessary. This will make your application much more efficient.