Search code examples
algorithmmongodbrating-system

An algorithm for picking choices from a collection based on their rating?


Let's say I have a collection of objects. I have another collection of likes, each one by a specific user and toward a specific object. Thus, over time through user ratings, each object has a variable amount of likes (all greater than 0).

I want to choose an object from this collection. Objects with more likes should be chosen more frequently, but objects with lower likes should also be sometimes chosen to give them a chance.

The algorithm I have in mind now is to order the objects by likes, and generate a random number, and use the number to choose a random object within a range. Assuming I had a hundred objects, 50% of the time objects from 0-10 are chosen, 25% of the time 10-15, and 25% of the time 15-100.

The obvious problem with this algorithm is scalability. When theirs 1000000s of objects, returning an array of all of them takes time. Does anyone have a better solution? THe database is implemented in mongodb.


Solution

  • I would denormalize a bit and add a 'like' counter field to the objects being liked. Increment it when the objects get liked, decrement it when the objects are un-liked.

    db.test.insert({
        stuff: "likable stuff",
        likes: 7
    })
    

    Then I would also have another field that represents the bucket that object is in as a result of the likes. So, for example, objects start out with this field set to 'ordinary' and after someone got 10 likes they would become 'elite'. (or whatever you want) Update it when they reach that threshold. The idea here is that doing work on the write will make the reads that much easier to do.

    db.test.insert({
        stuff: "likable stuff",
        likes: 7,
        status: "ordinary/elite",
    })
    

    Ok so now selecting the set of objects that are in the groups you defined based on # of likes is easy right? db.collection.find({ status: 'elite' })

    To randomize document selection within these sets you could randomly skip a given amount of records, but that will lead to awful performance and won't scale.

    However, there is a trick you can do whereby you store randomly generated numbers in the documents themselves.

    Let's insert one of these guys into a test db and check it out

    db.test.insert({
        stuff: "likable stuff",
        likes: 7,
        status: "ordinary/elite",
        random: Math.random()
    })
    

    Let's take a look at the document now:

    {
        stuff: "likable stuff",
        likes: 7,
        status: "ordinary/elite",
        random: 0.9375813045563468
    }
    

    Ok, here is where this gets really cool. Do a findOne() query where status: elite and rand_num: $gt { another randomly generated number btw 0 and 1 }.

    db.collection.find({ status: "elite", random: { "$gt": new_rand_num } })

    If the findOne() query doesn't return a result, do it again with $lt as you will be sure to find a document in at least one of the directions.

    Now lets index on status and random.

    db.collection.ensureIndex({ status: 1, random: 1} })

    What do you think?