Search code examples
sql-serverasp.net-mvc-4multiple-databases

Architecture for MVC Application for running web reports on data across 2 database servers


I have an ASP.NET MVC 4 application that uses a database in US or Canada, depending on which website you are on.

This program lets you filter job data on various filters and the criteria gets translated to a SQL query with a good number of table joins. The data is filtered then grouped/aggregated.

However, now I have a new requirement: query and do some grouping and aggregation (avg salary) on data in both the Canada Server and US server.

Right now, the lookup tables are duplicated on both database servers.

Here's the approach I was thinking:

Run the query on the US server, run the query again on the Canada server and then merge the data in memory.

Here is one use case: rank the companies by average salary. In terms of the logic, I am just filtering and querying a job table and grouping the results by company and average salary.

Would are some other ways to do this? I was thinking of populating a reporting view table with a nightly job and running the queries against that reporting table. To be honest, the queries themselves are not that fast to begin with; running, the query again against the Canada database seems like it would make the site much slower.

Any ideas?


Solution

  • Quite a number of variables here. If you don't have too much data then doing the queries on each DB and merging is fine so long as you get the database to do as much of the work as it is able to (i.e. the grouping, averaging etc.).

    Other options include linking your databases and doing a single query but there are a few downsides to this including

    • Having to link databases
    • Security associated with a linked database
    • A single query will require both databases to be online, whereas you can most likely work around that with two queries

    Scheduled, prebuilt tables have some advantages & disadvantages but probably not really relevant to the root problem of you having 2 databases where perhaps you should have one (maybe, maybe not).

    If the query is quite slow and called many times, the a single snapshot once could save you some resources provided the data "as at" the time of the snapshot is relevant and useful to your business need.

    A hybrid is to create an "Indexed View" which can let the DB create a running average for you. That should be fast to query and relatively unobtrusive to keep up to date.

    Hope some of that helps.