Search code examples
sqlsnowflake-cloud-data-platformfeature-engineering

How to do a MinMax Scale in Snowflake column and still maintain overall sum of column?


I currently have a challenge with Snowflake, where I have a PRICE column like the bellow, the goal is to "scale" this values but keep the original sum intact, like, I do not need to respect the proportions on the scale, but lowest value should continue to be the lowest and so on. Also I guess this can be a SQL problem.

I tried running a script using WIDTH_BUCKET but there is no option to maintain the total sum.

`SELECT 
    sale_date, 
    price,
    WIDTH_BUCKET(price, 200000, 600000, 5) AS "SALES GROUP"
  FROM home_sales
  ORDER BY sale_date;`

`+------------+-----------+-------------+
| SALE_DATE  |     PRICE | SALES GROUP |
|------------+-----------+-------------|
| 2013-08-01 | 10        |           1 |
| 2014-02-01 | 20        |           2 |
| 2015-04-01 | 30        |           3 |
| 2016-04-01 | 10        |           1 |
| 2017-04-01 | 50        |           4 |
| 2018-04-01 | 60        |           5 |
+------------+-----------+-------------+`

The output(SALES GROUP column) I am really looking for is like this: By this example the total SUM of column price is 180.

`+------------+-----------+-------------+
| SALE_DATE  |     PRICE | SALES GROUP |
|------------+-----------+-------------|
| 2013-08-01 | 10        |          12 |
| 2014-02-01 | 20        |          24 |
| 2015-04-01 | 30        |          28 |
| 2016-04-01 | 10        |          12 |
| 2017-04-01 | 50        |          47 |
| 2018-04-01 | 60        |          57 |
+------------+-----------+-------------+`

If we sum the SALES GROUP column, it still has a total of 180. *OBS: I know this is not exactly scaling, but I am new to snowflake and did not find the most correct term.


Solution

  • How about scaling up the natural log to match the sum?

    select *, (ln(price) * sum(price) over() / sum(ln(price)) over())::int as scaled_price
    from t;