Search code examples
sql-serverapache-superset

Compute percentile as a metric in Apache Superset with SQL Server backend


Similar to Superset Computing 90th percentile response time, except with a SQL Server data source. SQL Server has a different syntax for its percentile functions where the variable is not specified in the main function call and I can't figure out how to input it as a Superset metric.

I have Type as a Dimension in my Superset chart. I put AVG([Time]) as a metric, I get that value for each Type row. Great. Now to do 90th percentile.

For 90th percentile, I try

PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY [Time]) OVER (PARTITION BY [Type])

That errors with

"Column 'dbo.vw_Fire_Dashboard.Time' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

I've tried removing the OVER ... at the end as I did to adapt other metrics to Superset, but then it complains that

'PERCENTILE_CONT' must have an OVER clause.


Solution

  • This can't be done with a Superset metric with SQL Server. But you can do it with Jinja templating. Here the dashboard user can select filter values of Alarm Date and Status and they'll be passed into the SQL query before the 90th percentile is calculated.

    Enable the Jinja templating feature flag, then create a virtual dataset from SQL Lab. Save it and then edit it to look like this:

    SELECT DISTINCT
    [Incident Grouping],
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY [Time (Minutes)]) OVER (PARTITION BY [Incident Grouping]) AS [90th Percentile]
    FROM [database_name].[dbo].[table_name
    WHERE 1 = 1
    AND [Alarm Date] > '{{ from_dttm }}' AND [Alarm Date] < '{{ to_dttm }}'
    AND [Status] in {{ filter_values('Status')|where_in }}
    

    Incident Grouping is the variable I want to group by and I have dashboard native filters on my Superset dashboard that control the Alarm Date and Status variables in my table.

    It's fussy to create and edit. This chart won't load in edit mode because of the Jinja, it's only visible in dashboard mode. And to sync the columns or change their name after the AS I have to remove the DISTINCT and the Jinja filter lines . That's why I have the WHERE 1 = 1, so I can comment out those filter lines, sync the columns, and then uncomment them.