Search code examples
mdxpivot-tabledaxtop-nssas-tabular

Efficiently displaying Top N in Excel PivotTable from SSAS Tabular Model


I have a simple tabular model consisting of a fact table with approx. 20 mio. records (sales transactions) and a dimension table with approx 600.000 records (customers).

A typical reporting scenario is to get the top 10 customers over some measure in the fact table, possibly filtered by some other criteria (typically a time period, a product, etc.).

In Excel, aggregating all of the 20 mio. records to return a total sales amount is instant. However, once I try to group by customers, it takes some time (15-20 seconds) to retrieve all the data, which makes sense since there is quite a lot of customers (600.000) that needs to be displayed.

Now, if I apply a value filter in Excel, to get only the top 10 customers, it still takes around 15-20 seconds to return the result, which is unacceptable to my users (as they would like to instantly see the top 10 customers while slicing on other attributes such as product, time, etc).

Internally, Excel uses the TOPCOUNT MDX-function when querying the tabular model with a value filter.

Is there anything I can do in the tabular model, to speed up these kinds of queries?

I've tried:

  • Creating a calculated column on the dimension table, containing the total sales amount for each customer. While the performance is better, this is not the way to go because the values in this column can not be sliced by other attributes from the facts, and the column obviously ends up containing many distinct values (which is a bad thing in tabular).
  • Creating a calculated measure on the fact table, using the DAX RANKX function as suggested here. This caused my Tabular instance to crash (too many records in the dimension table?)
  • Executing a simple DAX statement directly on the tabular model using the DAX TOPN function. This was even slower than the TOPCOUNT MDX-approach, by a factor of 3-4.

Solution

  • After searching the web some more and performing additional tests, I have come to the conclusion that SSAS Tabular is inherently bad at answering TOPCOUNT/RANK-kind queries in an unfiltered context. The reason seems to be quite simple:

    If I have a dimension containing 600.000 customers, and I ask for the top 10 by Sales Amount in an unfiltered context, the tabular engine needs to compute the sum of Sales Amount for each individual customer, before being able to sort and return the top 10 or the rank of each customer.

    In multidimensional OLAP, data is typically preaggregated on a pr. customer level, meaning the multidimensional cube can answer these kinds of queries much faster.

    When applying one or more filters in SSAS tabular (for example current month, a specific product, etc.), I saw a significant performance increase. The solution, in my case, is then to educate my users to always filter their data before including the customer dimension in their PivotTables.