Search code examples
sqlteradataanalytic-functions

Obtaining multiple percentiles (percentile_cont equivalent) in one pass within Teradata


I understand that we can rewrite percentile_cont within Teradata as:

SELECT
  part_col
  ,data_col
   + ((MIN(data_col) OVER (PARTITION BY part_col ORDER BY data_col ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) - data_col)
       * (((COUNT(*) OVER (PARTITION BY part_col) - 1) * x) MOD 1)) AS percentile_cont
FROM tab
QUALIFY ROW_NUMBER() OVER (PARTITION BY part_col ORDER BY data_col)
    = CAST((COUNT(*) OVER (PARTITION BY part_col) - 1) * x AS INT) + 1;

See this very helpful discussion for more information.

Understanding that replacing x with 0.90 would return the 90th percentile, is there an elegant way of extending this and returning multiple percentiles in one pass?

For example, say I want to extend this example and return the 25th, 50th, and 75th percentiles in one pass? Is this possible? Seems like I would need multiple QUALIFY statements? Similarly, if I desire multiple GROUP BY equivalents, is this akin to passing more columns in the PARTITION BY?


-- SQL:2008 Equivalent pseudo-code
SELECT
  part_col_a
 ,part_col_b
 ,PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY order_col) AS p25
 ,PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY order_col) AS p50
 ,PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY order_col) AS p75
FROM tab
GROUP BY
  part_col_a
  ,part_col_b

Solution

  • You should fully read my blog, the final query is doing exactly what you want :-)

    SELECT part_col
      ,MIN(pc25) OVER (PARTITION BY part_col) AS quartile_1
      ,MIN(pc50) OVER (PARTITION BY part_col) AS quartile_2
      ,MIN(pc75) OVER (PARTITION BY part_col) AS quartile_3
    FROM
     (
      SELECT
        part_col
        ,COUNT(*)      OVER (PARTITION BY part_col) - 1 AS N
        ,ROW_NUMBER()  OVER (PARTITION BY part_col ORDER BY data_col) - 1 AS rowno
        ,MIN(data_col) OVER (PARTITION BY part_col ORDER BY data_col ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) - data_col AS diff
    
        ,CASE
           WHEN rowno = CAST(N * 0.25 AS INT)
           THEN data_col +(((N * 0.25) MOD 1) * diff)
         END AS pc25
    
        ,CASE
           WHEN rowno = CAST(N * 0.50 AS INT)
           THEN data_col +(((N * 0.50) MOD 1) * diff)
         END AS pc50
    
        ,CASE
           WHEN rowno = CAST(N * 0.75 AS INT)
           THEN data_col +(((N * 0.75) MOD 1) * diff)
         END AS pc75
      FROM tab
      QUALIFY   rowno = CAST(N * 0.25 AS INT)
           OR   rowno = CAST(N * 0.50 AS INT)
           OR   rowno = CAST(N * 0.75 AS INT)
     ) AS dt
    QUALIFY ROW_NUMBER() OVER (PARTITION BY part_col ORDER BY part_col) = 1