Search code examples
ssasdaxweighted-average

DAX weighted average based on another dimension


I have the following data:

DATE        COUNTRY     ITEM        Value
2005-01-01  UK          op_rate     30%
2005-01-01  UK          proc        1000
2005-01-01  UK          export      750 
2005-01-01  ITA         op_rate     45%
2005-01-01  ITA         proc        500 
2005-01-01  ITA         export      350

Basically, data in normal format, which includes both ratios (the op_rate) and other items such as exported volumes and processed volumes ("proc").

I need to aggregate by SUM for "proc" and "export", but not for the "op_rate", for which I need a weighted average by "proc".

In this case the aggregated op_rate would be: 0.45*500 + 0.30*1000 = 0.35 // instead of a .75 SUM or 0.375 AVERAGE

All example I find for weighted average are across measures, but none covers using other dimensions.

Any help most welcome!


Solution

  • I understand that you are reluctant to change your model. The problem you have here is that you are trying to consume a highly normalised table and use it for analysis using an OLAP tool. OLAP tools prefer Fact/Dim star schemas and Tabular/PowerBI is no different. I suspect that this is going to continue to problems with future requirements too. Taking the hit on changing the structure now is the best time to do it as it will get more difficult the longer you leave it.

    This isn't to say that you can't do what you want using the tools, but the resulting dax will be less efficient and the storage required will be sub-optimal.

    So with that caveat/lecture given (!) here is how you can do it.

    op_rate_agg =
    VAR pivoted =
        ADDCOLUMNS (
            SUMMARIZE ( 'Query1', Query1[COUNTRY], Query1[DATE] ),
            "op_rate", CALCULATE ( AVERAGE ( Query1[Value] ), Query1[ITEM] = "op_rate" ),
            "proc", CALCULATE ( SUM ( Query1[Value] ), Query1[ITEM] = "proc" )
        )
    RETURN
        DIVIDE ( SUMX ( pivoted, [op_rate] * [proc] ), SUMX ( pivoted, [proc] ) )
    

    It is really inefficient as it is having to build your pivoted set every execution and you will see that the query plan is having to do a lot more work than it would if you persisted this as a proper Fact table. If your model is large you will likely have performance issues with this measure and any that references it.