Search code examples
mysqlarchitecturememcachedvoting-system

Implementing dynamically updating upvote/downvote


How to implement dynamically updating vote count similar to quora:- Whenever a user upvotes an answer its reflected automatically for every one who is viewing that page.

I am looking for an answer that address following:

  • Do we have to keep polling for upvote counts for every answer, If yes then how to manage the server load arising because of so many users
    polling for upvotes.
  • Or to use websockits/push notifications, how scalable are these?
  • How to store the upvote/downvote count in databases/inmemory to support this. How do they control the number of read/writes. My backend database is mysql

The answer I am looking for may not be exactly how quora is doing it, but may be how this can be done using available opensource technologies.


Solution

  • It's not the back-end system details that you need to worry about but the front end. Having connection being open all the time is impractical at any real scale. Instead you want the opposite - to be able to serve and close connection from back-end as fast as you can.

    Websockets is a sexy technology, but again, in real world there are issues with proxies, if you are developing something that should work on a variety of screens (desktop, tablet, mobile) it might became a concern to you. Even good-old long polls might not work through firewalls and proxies.

    Here is a good news: I think

    "keep polling for upvote counts for every answer"

    is a totally good solution in this case. Consider the following:

    • your use-case does not need any real real-time updates. There is little harm to see the counter updated a bit later
    • for very popular topics you would like to squash multiple up-votes/down-votes into one anyway
    • most of the topics will see no up-vote/down-vote traffic at all for days/weeks, so keeping a connection open, waiting for an event that never comes is a waste
    • most of the user will never up-vote/down-vote that just came to read a topic, so your read/write ration of topics stats will be greatly skewed toward reads
    • network latencies varies hugely across clients, you will see horrible transfer rates for a 100B http responses, while this sluggish client is fetching his response byte-by-byte your precious server connection and what is more importantly - thread on a back end server is busy

    Here is what I'd start with:

    • have browsers periodically poll for a new topic stat, after the main page loads
    • keep your MySQL, keep counters there. Every time there is an up/down vote update the DB
    • put Memcached in front of the DB as a write-through cache i.e. every time there is an up/down vote update cache, then update DB. Set explicit expire time for a counter there to be 10-15 minutes . Every time counter is updated expire time is prolongated automatically.
    • design these polling http calls to be cacheable by http proxies, set expire and ttl http headers to be 60 sec
    • put a reverse proxy(Varnish, nginx) in front of your front end servers, have this proxy do the caching of the said polling calls. These takes care of the second level cache and help free up backend servers threads quicker, see network latencies concern above
    • set-up your reverse proxy component to talk to memcached servers directly without making a call to the backend server, yes if your can do it with both Varnish and nginx.
    • there is no fancy schema for storing such data, it's a simple inc()/dec() operation in memcached, note that it's safe from the race condition point of view. It's also a safe atomic operation in MySQL UPDATE table SET field = field + 1 WHERE [...]

    Aggressive multi level caching covers your read path: in Memcached and in all http caches along the way, note that these http poll requests will be cached on the edges as well.

    To take care of the long tail of unpopular topic - make http ttl for such responses reverse proportional to popularity.

    A read request will only infrequently gets to the front end server, when http cache expired and memcached does not have it either. If that is still a problem, add memecached servers and increase expire time in memcached across the board.

    After you done with that you have all the reads taken care of. The only problem you might still have, depending on the scale, is high rate of writes i.e. flow of up/down votes. This is where your single MySQL instance might start showing some lags. Fear not - proceed along the old beaten path of sharding your instances, or adding a NoSQL storage just for counters.

    Do not use any messaging system unless absolutely necessary or you want an excuse to play with it.