Search code examples
sqldatabasedata-warehousedata-vault

Database Architecture, Database Split


I have come across DB architecture which does not feel right to me. It is for a small team of developers... I would appreciate any opinion on this design.

This is a simplified description of the system. All of the 3rd NF databases (Client, Accounting, Rate, Exposure)

We have 4 Normal Form DBs:

Client DB: maintain client & organisation info

Rate DB: getting exchange rates from a 3rd party system

Exposure DB: contacting a 3rd party system for getting our bank account and trade info

Accounting DB: further calculation on the financial risk and forecasting

We have the following databases for data wharehousing

SQL Server Analysis Services: Star Schema

Cube

Database Split: Our 4 Databases (Client, Rate, Exposure, Accounting) are split amount 4 SQL Servers, but they all run on the same physical server. These databases need the data from eachother, for example we have an Organisation Table which is used in all DBs… or Rates are needed in other DBs.

Analysis Services: We have a star schema and Analysis Services. My understanding is that Data Vault could be used as a source for generating Start Schema…. But we are not using our Data Vault for that purpose. We use SSIS to read the data directly from Client, Rate, Exposure and Accounting DBs and populate start schema directly.

Question:

  1. Is Splitting databases a good idea when we need to use the data within those splitted Databases?

  2. Is there a good Source/Blog to explain when it is a good idea for splitting a database?

  3. Is copying tables from source database to destination database a good solution? I feel cross DB queries would be much more simple and efficient than copying so many tables into multiple DBa.


Solution

  • Whether something is a good idea is a question of opinion. What is less so is a question of what the tradeoffs are, and here I can discuss those.

    Splitting databases between departments gives departments greater freedom in deciding their domain models. One of the valid insights of the DDD school of thought is that teams form bounded contexts which may use vocabulary in a way slightly different than others. If giving the various teams more flexibility in deciding their own terminology and data models is a positive thing, that is going for this.

    On the other hand, there are a number of performance drawbacks. Each system knows less about the data distribution in each database, and therefore is less able to effectively plan. Cross-node joins are always expensive. So you get some real downsides as well.

    I think copying data tends to work against the bounded context advantages of division btw and that is a warning. But then this is a tradeoff between autonomy and performance.

    A couple other things to consider:

    1. Would linked servers be a better option than a data vault?
    2. Would some sort of ETL controlled by the destination database be better?