Search code examples
c#algorithmmongodbmapreducereddit

Sorting records by reddit algorithm using mongodb


I'm trying to implement the reddit algorithm as a sorting option in my app but I'm constantly hitting walls all over the place.

I started my implementation using this (Sorting mongodb by reddit ranking algorithm) post as a guide line.

I tried to convert it to c#; below is my attempt at the conversion.

var map = new BsonJavaScript(
    @"function() {

        function hot(score, date){
            var order = log10(Math.max(Math.abs(score), 1));
            var sign = score>0 ? 1 : score<0 ? -1 : 0;
            var seconds = epochSeconds(date) - 1134028003;
            var product = order + sign * seconds / 45000;
            return Math.round(product*10000000)/10000000;
        }

        function log10(val){
            return Math.log(val) / Math.LN10;
        }

        function epochSeconds(d){
            return (d.getTime() - new Date(1970,1,1).getTime())/1000;
        }

        emit( hot(this.VoteCount, this.CreatedAt), this );

    }"
);

var reduce = new BsonJavaScript(
    @"function(){}"
);

var finalize = new BsonJavaScript(
    @"{ 'out': { 'inline': 1 } }"
);

return db.Posts.MapReduce(new MapReduceArgs { MapFunction = map, ReduceFunction = reduce, FinalizeFunction = finalize }).GetResults();

He's the results I'm getting from the implementation;

enter image description here

He's the actual dataset. enter image description here

For some reason the function returns 2 objects instead of 4. Also, what would I need to modify for the function to return the entire post object along with the calculated score?

Would really appreciate it if someone would help me out :)

Thanks in advance, Jean


Solution

  • Fixed it by making 2 modifications.

    These 2 resources where extremely helpful; http://docs.mongodb.org/manual/reference/command/mapReduce/#mapreduce-map-cmd http://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#db.collection.mapReduce

    Firstly I changed what parameters I pass to emit. I'm assigning a "score" value to the post object on the fly and run the hot function on it. Then I pass the key parameter for emit as the object key and the value parameter as the new post object with the score value. **emit(key, value)

    this.score = hot(this.VoteCount, this.CreatedAt);
    emit( this._id, this );
    

    Then I changed how I get the results to;

    db.Posts.MapReduce(new MapReduceArgs { MapFunction = map, ReduceFunction = reduce}).InlineResults
    

    Hope this is helpful someone else :)

    I'll post benchmarks using this method of calculating the score to calculating the score in C# when I have some free time.


    Alt implementation / Update: I switched to a simpler / faster decay algorithm used by Hacker News as it still meets my requirements. http://amix.dk/blog/post/19574

    Score = (P-1) / (T+2)^G
    
    where,
    P = points of an item (and -1 is to negate submitters vote)
    T = time since submission (in hours)
    G = Gravity, defaults to 1.8 in news.arc