I'm trying to calculate a business-logic in DAX which has turned out to be quite resource-heavy and complex. I have a very large PowerPivot model (call it "sales") with numerous dimensions and measures. A simplified view of the sales model:
+-------+--------+---------+------+---------+-------+
| State | City | Store | Week | Product | Sales |
+-------+--------+---------+------+---------+-------+
| NY | NYC | Charlie | 1 | A | $5 |
| MA | Boston | Bravo | 2 | B | $10 |
| - | D.C. | Delta | 1 | A | $20 |
+-------+--------+---------+------+---------+-------+
Essentially what I'm trying to do is calculate a DISTINCTCOUNT of product by store and week:
SUMMARIZE(Sales,[Store],[Week],"Distinct Products",DISTINCTCOUNT([Product]))
+---------+------+-------------------+
| Store | Week | Distinct Products |
+---------+------+-------------------+
| Charlie | 1 | 15 |
| Charlie | 2 | 7 |
| Charlie | 3 | 12 |
| Bravo | 1 | 20 |
| Bravo | 2 | 14 |
| Bravo | 3 | 22 |
+---------+------+-------------------+
I then want to calculate the AVERAGE of these Distinct Products at the store level. The way I approached this was by taking the previous calculation, and running a SUMX on top of it and dividing it by distinct weeks:
SUMX(
SUMMARIZE(Sales,[Store],[Week],"Distinct Products",DISTINCTCOUNT([Product]))
,[Distinct Products]
) / DISTINCTCOUNT([Week])
+---------+------------------+
| Store | Average Products |
+---------+------------------+
| Charlie | 11.3 |
| Bravo | 18.7 |
+---------+------------------+
I stored this calculation in a measure and it worked well when the dataset was smaller. But now the dataset is so huge that when I try to use the measure, it hangs until I have to cancel the process.
Is there a more efficient way to do this?
SUMX is appropriate in this case since you want the distinct product count calculated independently for each store & for each week, then summed together by store, and then divided by the number of weeks by store. There's no way around that. (If there was, I'd recommend it.)
However, SUMX is an iterator, and so is the likely cause of the slowdown. Since we can't eliminate the SUMX entirely, the biggest factor here is the number of combinations of stores/weeks that you have.
To confirm if the number of combinations of stores/weeks is the source of the slowdown, try filtering or removing 50% from a copy of your data model and see if that speeds things up. If that doesn't time out, add more back in to get a sense of how many combinations are the failing point.
To make things faster with the full dataset:
.
Calculated Table =
SUMMARIZE (
Sales,
[Store],
[Week],
"Distinct Products", DISTINCTCOUNT ( Sales[Product] )
)
Note: The calculated table code above is rudimentary and is mostly designed as a proof of concept. If this is the path you take, you'll want to make sure you have a separate store dimension to join the calculated table to, as this won't join to the source table directly
Measure Using Calc Table =
SUMX (
'Calculated Table',
[Distinct Products] / DISTINCTCOUNT ( 'Calculated Table'[Week] )
)
Jason Thomas has a great post on calculated tables and when they can come in useful here: http://sqljason.com/2015/09/my-thoughts-on-calculated-tables-in.html.
If you can't use calculated tables, but your data is coming from a database of some form, then you could do the same logic in SQL and then import a pre-prepared separate table of unique store/months and their distinct counts.
I hope some of this proves useful (or you've solved the problem another way).