Search code examples
sqlsnowflake-cloud-data-platformwindow-functions

Snowflake SQL | How to sum over distinct values in one pass?


I am working with survey response data, where a user could have taken the survey multiple times. Users have a "weight" assigned to them, which is an integer. I am trying to calculate the sum of user_weight by city, making sure I don't double count the user_weight for users who have taken the survey multiple times.

Using the example data I provided here, note that each of the users here took the survey twice.

enter image description here

I can calculate the correct city_weight using a two-pass approach. Below is snowflake sql code using a CTE that de-dupes the (city, user, user_weight) tuples first and then sums up the user_weight by city. This approach works, and I get the correct weights (of 150 for chicago, and 55 for denver).

Is there a one-pass approach to solve this problem?

create or replace temp table temp as 
select 1 as response_id, 1 as user_id, 50 as user_weight, 'chicago' as city
union select 2, 1, 50, 'chicago'
union select 3, 2, 100, 'chicago'
union select 4, 2, 100, 'chicago'
union select 5, 3, 30, 'denver'
union select 6, 3, 30, 'denver'
union select 7, 4, 25, 'denver'
union select 8, 4, 25, 'denver'
;

with base as (
select
 distinct
    city,
    user_id,
    user_weight
 from
    temp
)
select
    city,
    sum(user_weight) as city_weight
from
    base
group by city
;


Solution

  • You can dedupe with a group by, then run a windowed sum on top of that, then add a distinct to collapse the window. That said, what you have is perfectly fine and more self-explanatory

    select distinct 
           city,
           sum(user_weight) over (partition by city) as city_weight
    from temp
    group by city, user_id, user_weight;