Search code examples
snowflake-cloud-data-platformhistogrambucket

Create buckets in snowflake without specified max value


At the moment I'm using this in my query to get order value bins:

   case 
 when (oi.pricePerItemExVat * oia.quantity) >= 0 and (oi.pricePerItemExVat * oia.quantity) <= 10000 then '0-10000'
 when (oi.pricePerItemExVat * oia.quantity) >10000 and (oi.pricePerItemExVat * oia.quantity) <= 20000 then '10001-20000'
 when (oi.pricePerItemExVat * oia.quantity) >20000 and (oi.pricePerItemExVat * oia.quantity) <= 30000 then '20001-30000'
 when (oi.pricePerItemExVat * oia.quantity) >30000 and (oi.pricePerItemExVat * oia.quantity) <= 40000 then '30001-40000'
 when (oi.pricePerItemExVat * oia.quantity) >40000 and (oi.pricePerItemExVat * oia.quantity) <=50000 then '40001-50000'
 when (oi.pricePerItemExVat * oia.quantity) >50000 then 'over 50000'
     end as orderVolumeBins

Instead I want to use this function:

WIDTH_BUCKET( <expr> , <min_value> , <max_value> , <num_buckets> )

Which in my case could be

WIDTH_BUCKET( <volumeBins> , <0> , <50000> , <5> )

But that wouldn't give me the bin with all orders that have volumes above 50000. Does anyone know if there's a variant where this is possible?


Solution

  • But that wouldn't give me the bin with all orders that have volumes above 50000

    WIDTH_BUCKET:

    When an expression falls outside the range, the function returns:

    • 0 if the expression is less than min_value.

    • num_buckets + 1 if the expression is greater than or equal to max_value.

    For 50,001:

    SELECT WIDTH_BUCKET( 50001 , 0 , 50000 , 5 )
    -- 6
    

    EDIT:

    Can I name the bins in any way? Now I'm getting 1,2,3,4,5,6 instead of 10k-20k etc

    SELECT
       CASE WIDTH_BUCKET((oi.pricePerItemExVat * oia.quantity), 0, 50000,5)
         WHEN 1 THEN '0-10000'
         WHEN 2 THEN '10001-20000'
         -- ...
         WHEN 5 THEN ...
         WHEN 6 THEN '>=50000'
       END