Search code examples
database-designrelational-databasescale

Which database design scales better?


Which would scale better for recording votes and subscriptions of a question?

A. 2 tables, 1 for each subscriptions and votes, each with foreign keys to join questions, in which case I'd query with a union of 2 queries to each table to get all the activity for the question.

OR

B. 1 table with 1 foreign key to join to questions and a field to record if the row is a vote or subscription. In this case there will be an extra join condition for some of my other queries too.


Solution

  • When you talk about 'scaling', you have to be more precise on what operations you want to scale?* Do you need to optimise read, write, both? file dumps? fetch external content?

    Once you know what you want to scale, often you'll also need or want to figure out how much you want to scale to, and if your tweaks offer any benefit at all, by benchmarking and profiling.

    This is fun, depending on the context. Benchmarking on a saturday night is not allright.

    Some of the database optimising techniques I am aware of involve denormalisation and such tricks that you don't see much in ordinary environnments, so using those "hacks" for better performance sometimes (often) bring a price on code maintainability for example. In the case of denormalisation technique I mention above, you lose some of the data-integrity your database offers and then have to implement it in your application code.

    Not so good for the developper, he has to replicate a DB's data integrity just to have a faster database query time. In light of the above, friend it is my humble opinion (being a lazy developer), many scaling problems can be avoided by disregarding scaling problems in the first place, until they occur.

    If you want to talk of standard, good db building practices, from a general sql perspective, I only know of the 2 main ways, the normalised way and the non normalised way. Both have sub-variants but I guess that would look like your A, and your B. Both are valid options, and both have pros and cons, but if your application is not to be flooded by thousands of hits a minute I would say use A, maybe something similar to this:

    Table votes - has primary key vid,

    Table questions - has primary key

    Table subscriptions - has primary sid, foreign key vid, qid

    Good-luck, happy coding!