Search code examples
sqlhivepercentile

Creating deciles in SQL


I'm trying to bucket my data into deciles, but not in the traditional sense where the dimension is the basis of the decile.

I have 463 unique it_scores ranging from 316-900 (my dimension) with 1,296,070 trade_counts (my measure) total. Using the following code breaks my data into 10 buckets with 47 unique it_scores:

ntile(10) over (order by it_score)) as tileno

While this is definitely doing what it's supposed to, I need my buckets to be built on the basis of total trade_counts, with each bucket containing about 129.6k observations. The it_score is still the dimension but the ranges wouldn't necessarily be equal i.e. decile 10 might have a range of 316-688 with 129.6k observations while decile 9 might be 689-712 also with 129.6k observations.

How would I achieve that?


Solution

  • SUM(trade_count) OVER (ORDER BY it_score) to assign deciles based on cumulative trade_counts.

    SELECT
      decile,
      SUM(trade_count) AS decile_trade_count
    FROM
      (
        SELECT
          it_score,
          trade_count,
          FLOOR(
            (SUM(trade_count) OVER (ORDER BY it_score) - 1) / (SUM(trade_count) OVER ()) * 10
          ) + 1 AS decile
        FROM table
      ) sub
    GROUP BY decile
    ORDER BY decile;