Search code examples
ssaspowerbidaxtabular

Optimizing Summarize in DAX


I have this DAX formula that gives me a count of id that appear on the fact table in a month, averaged over the year. I can put this measure is a table ad it's unpacked by row with no issues (by adding variables from dimensions)

Measure:= AVERAGEX(
    SUMMARIZE(
        CALCULATETABLE(fact_table;FILTER('Time_Dimension';'Time_Dimension'[Last_month] <> "LAST"));
         Time_Dimension[Month Name];
        "Count";DISTINCTCOUNT(fact_table[ID])
    );
    [Count]
)

But it's terrible slow (I have 3 measures like this on a single table) and the fact table is big (like 300Million rows big)

I was reading that SUMMARIZE perform really bad with aggregations and It should be replaced with SUMMARIZECOLUMNS. I wrote this formula

Measure_v2:= AVERAGEX(
    SUMMARIZECOLUMNS(
        Time_Dimension[Month Name];
        FILTER(Time_Dimension;
            Time_Dimension[Month Name]<>"LAST"
        );
        "Count";DISTINCTCOUNT(fact_table[ID])
    )
    [Count]
)

And it works when I visualize the measure as it is, but when I try to put it in a context (like the table above) it gives me the error "Can't use SUMMARIZECOLUMN and ADDMISSINGITEMS() in this context" How can I make a sustainable optimization from the original SUMMARIZE function?


Solution

  • Before optimizing SUMMARIZE, I would re-visit the overall approach. If your goal is to calculate average fact count per year-month, there is a simpler (and faster) way.

    [ID Count]:=CALCULATE(COUNT('fact_table'[ID]),'Time_Dimension'[Last_month] <> "LAST") 
    
    [Average ID Count]:=AVERAGEX( VALUES('Time_Dimension'[Year_Month]), [ID Count`])
    

    assuming that:

    • you have year-month attribute in your time dimension;
    • IDs in your fact table are unique (and therefore, simple count is enough)

    If this solution does not solve your problem, then please post your data model - it's hard to optimize without knowing the data structure.

    On a side note, I would remove ID field from the fact table. It adds no value to the model, and consumes huge amounts of memory. Your objective can be achieved by simply counting rows:

    [Fact Count]:=CALCULATE(COUNTROWS('fact_table'),'Time_Dimension'[Last_month] <> "LAST")