Search code examples
mysqldata-analysisdata-warehouse

Caching reporting data in same transaction database VS using a data warehouse


We have a SaaS solution, in which each Tenant has his own MySQL database. Now I'm designing the dashboards of this SaaS system and it requires some analytical charts. To get the data needed for charts we could query the transactional data of each tenant from its database in real time. and get updated charts with no bad performance since so far the data volume not that big. However, because the data volume will be growing we decided to separate the analytical and transactional data of each company, we will get the analytical data for the charts in the background, save/caching them and do periodical updates. My question is:

  • What good questions or factors we should considering before deciding whether or not we need to include a data warehouse and data modeling from the beginning or simply Caching the analytical data of the charts resulting from our API in JSON columns in a new table for charts in each tenant's MYSQL database.

Solution

  • Instead of reaching into the "Fact" table for millions of rows, build and maintain a Summary table, then fetch from that. It may run 10 times as fast.

    This does require code changes because of the extra table, but it may be well worth it.

    Summary Tables

    In other words, if the dataset will become bigger than X, Summary tables is the best solution. Caching will not help. Hardware won't be sufficient. JSON only gets in the way.

    Building a year-long graph from a year's worth data points (one per second) is slow and wasteful. Building a year-long graph from daily subtotals is much more reasonable.