Search code examples
azuredata-warehouseolapolap-cubeazure-synapse

Precalculate OLAP cube inside Azure Synapse


We have dimensinal model with fact tables of 100-300 GBs in parquet each. We build PBI reports on top of Azure Synapse (DirectQuery) and experience performance issues on slicing/dicing and especially on calculating multiple KPIs. In the same time data volume is pretty expensive to be kept in Azure Analysis Services. Because of number of dimensions, the fact table can't be aggregated significantly, so PBI import mode or composite model isn't an option as well.

Azure Synapse Analytics faciliates OLAP operations, like GROUP BY ROLLUP/CUBE/GROUPING SETS.

  1. How can I benefit from Synapse's OLAP operations support?
  2. Is that possible to pre-calculate OLAP cubes inside Synapse in order to boost PBI reports performance? How?
  3. If the answer is yes, is that recomended to pre-calculate KPIs? Means moving KPIs definition to DWH OLAP cube level - is it an anti-pattern?

P.S. using separate aggreagations for each PBI visualisation is not an option, it's more an exception from the rule. Synapse is clever enough to take the benefit from materialized view aggregation even on querying a base table, but this way you can't implement RLS and managing that number of materialized views also looks cumbersome.

Upd for @NickW

Could you please answer the following sub-questions:

  1. Have I got it right - OLAP operations support is mainly for downstream cube providers, not for Warehouse performance?
  2. Is spawning Warehouse with materialized views in order to boost performance is considered a common practice or an anti-pattern? I've found (see the link) Power BI can create materialized views automatically based on query patterns. Still I'm afraid it won't be able to provide a stable testable solution, and RLS support again.
  3. Is KPIs pre-calculation at Warehouse side considered as a common way or an anti-pattern? As I understand this is usually done no cube provider side, but if I haven't got one?
  4. Do you see any other options to boost the performance? I can think only about reducing query parallelism by using PBI composite model and importing all dimensions to PBI. Not sure if it'd help.

Solution

  • Synapse Result Set Caching and Materialized Views can both help.

    In the future the creation and maintence of Materialized Views will be automated.

    Azure Synapse will automatically create and manage materialized views for larger Power BI Premium datasets in DirectQuery mode. The materialized views will be based on usage and query patterns. They will be automatically maintained as a self-learning, self-optimizing system. Power BI queries to Azure Synapse in DirectQuery mode will automatically use the materialized views. This feature will provide enhanced performance and user concurrency.

    https://learn.microsoft.com/en-us/power-platform-release-plan/2020wave2/power-bi/synapse-integration

    Power BI Aggregations can also help. If there are a lot of dimensions, select the most commonly used to create aggregations.