Search code examples
servertime-complexityvote

How should I design a poll-style voting sytem?


For reasons which aren't important, I've started to think about the best way to create a simple, unauthenticated poll. The poll would be for a preferred category out of a list of categories (ex. favorite dessert out of brownies, ice cream, and cake), and would not require users to log in. As someone with minimal experience with serverside programming, I have two basic ideas. Both ideas have the client make a POST request to /some/url/vote with the body being something like { vote: "brownies" }. The difference is in how the server handles it.

  1. On the server, store a tally for each category. Each time it receives a request from /some/url/vote, add one vote to the category. Internally might look like: { "brownieVotes": 1, "iceCreamVotes": 16 }
  2. On the server, store a vote for each user that connects to it (would probably have to use ip address, or some type of unique token, to keep track of users). Every time it receives a request from /some/url/vote, it records the user's vote. Internally might look like (using ip address for example) { "1.1.1.1": "brownies", "1.1.1.2", "ice cream" }.

The first idea is simple to implement, but users can vote multiple times and can't change their vote. If this were for something more important than desserts, someone could flood the server with votes.

The second idea is safer than the first because users cannot vote multiple times as easily, but I imagine that with a large poll storing many votes into a .json file (or similar) could cause problems with file read/write times.

Neither of these ideas are especially good, so how would a professional manage this with as few flaws as possible?


Solution

  • The second idea is certainly better than the first. However, it is still not ideal, as for example network administrators often have access to hundreds or even several thousands of IPv4 addresses. With IPv6, this problem is even more extreme, as private people also often have access to many such addresses.

    Therefore, you would have to additionally check which IP address ranges the votes are coming from (which can be checked manually or automatically), and if you register an excessive amount of votes from a certain IP addess range, to not count votes from these addresses.

    The second idea is safer than the first because users cannot vote multiple times as easily, but I imagine that with a large poll storing many votes into a .json file (or similar) could cause problems with file read/write times.

    You are right that this can be a problem with a large number of votes. A naive implementation would simply search the whole file, every time it receives a new vote, in order to determine whether that user has already voted. This corresponds to a time complexity of O(n), which is very inefficient. Such a naive implementation would probably only be acceptable if you have only a small number of votes.

    A much better implementation would use a data structure with a time complexity of O(log n) for search and insertion operations, such as a red-black tree or a B-tree. Such an implementation would be able to handle even very large numbers of votes with an acceptable search and insertion time.

    If you are using a webserver with its own database system (e.g. MySQL or PostgreSQL), then you can use that too, as they use optimal data structures such as those mentioned above.