Search code examples
c#sql-serverwinformsrackspace-cloud

Pros/Cons Using multiple databases vs using single database


I need to design a windows application which represents multiple "customers" in SQL Server. Each customer has the same data model, but it's independent.

what will be the Pros/Cons Using multiple databases vs using single database.

which one is the best way to do this work. if going for an single database, what will the steps to do for that.

Edited:

One thing is database will be hosted in cloud(rackspace) account.


Solution

  • Do not store data from multiple customers in the same database -- I have known companies that had to spend a lot of time/effort/money fixing this mistake. I have even known clients to balk at sharing a database computer even though the databases are separate - on the plus side, these clients are generally willing to pay for the extra hardware.

    1. The problems with security alone should prevent you from ever doing this. You will lose large customers because of this.

    2. If you have some customers that are unwilling to upgrade their software, it can be very difficult if you share a single database. Separate databases allow customers to continue using the old database structure until they are ready to upgrade.

    3. You are artificially limiting a natural data partition that could provide significant scalability to your solution. Multiple small customers can still share a database server, they just see their own databases/catalogs, or they can run on separate database servers / instances.

    4. You are complicating your database design because you will have to distinguish customer data that would otherwise be naturally separated, i.e., having to supply CustomerID on each where clause.

    5. You are making your database slower by having more rows in all tables. You will use up database memory more rapidly because CustomerID is now part of every index, and fewer records can be stored in each index node. Your database is also slower due to the loss of the inherent advantage of locality of reference.

    6. Data rollback for 1 customer can be very difficult, maybe even essentially impossible as the database grows - you will need custom procedures to do this that are much slower and resource intensive than a simple and standard restore from backup.

    7. Large databases can be very difficult to backup / restore in a timely manner, possibly requiring additional spending on hardware to make it fast enough.

    8. Your application(s) that use the database will be harder to maintain and test.

    9. Any mistakes can be much more destructive as you can mess up all of your clients by a single mistake.

    10. You prevent the possible performance enhancement of low-latency by forcing your database to a single location. E.g., overseas customer will be using slow, high-latency networks all the time.

    11. You will be known as the stupid DBA, or the unemployed DBA, or maybe both.

    There are some advantages to a shared database design though.

    1. Common table schemas, code tables, stored procs, etc. need only be maintained and stored in 1 location.

    2. Licensing costs may be reduced in some cases.

    3. Some maintenance is easier, although almost certainly worse overall using a combined approach.

    4. If all/most of your clients are very small, you can have a low resource utilization by not combining servers (i.e., a relatively high cost). You can mitigate the high cost by combining clients with their permission and explicit understanding, but still use separate databases for larger clients. You definitely need to be explicit and up-front with your clients in this situation.

    Except for the server cost sharing, this is a very bad idea still - but cost can be a very important aspect too. This is really the only justification for this approach - avoid this if at all reasonable though. Maybe you would be better off to change a little more for you product, or just not be able to support tiny customers for a cheap price.


    Reading an analysis of the recent Atlassian outage reveals that this mistake is precisely why they are having such trouble recovering.

    There is a problem, though:

    Atlassian can, indeed, restore all data to a checkpoint in a matter of hours.

    However, if they did this, while the impacted ~400 companies would get back all their data, everyone else would lose all data committed since that point

    So now each customer’s data needs to be selectively restored. Atlassian has no tools to do this in bulk.

    The article also makes it clear that some customers are already migrating away from Atlassian for their OpsGenie product, and will certainly lose future business too. At a minimum, this will be a large problem for their business.

    They also messed up big-time by ignoring the customer during this outage.