Search code examples
sqldata-sciencesnowflake-cloud-data-platformdata-analysisdata-partitioning

How to count distinct value with partition by and order by in Snowflake sql?


My data is as follows:

| user | eventorder| postal|
|:---- |:---------:| -----:|
| A    | 1         | 60616 |
| A    | 2         | 10000 |
| A    | 3         | 60616 |
| B    | 1         | 20000 |
| B    | 2         | 30000 |
| B    | 3         | 40000 |
| B    | 4         | 30000 |
| B    | 5         | 20000 |

The problem I need to solve: how many distinct stops until each event order that user has travelled?

The ideal result should be as follows:

| user | eventorder| postal| travelledStop|
|:---- |:---------:| -----:| ------------:|
| A    | 1         | 60616 |  1    |
| A    | 2         | 10000 |  2    |
| A    | 3         | 60616 |  2    |
| B    | 1         | 20000 |  1    |
| B    | 2         | 30000 |  2    |
| B    | 3         | 40000 |  3    |
| B    | 4         | 30000 |  3    |
| B    | 5         | 20000 |  3    |

Take A as an example, when event order is 1, it only travelled 60616 - 1 stop. When event order is 2, it has travelled 60616 and 10000 - 2 stops. When event order is 3, the distinct stops this user has travelled are 60616 and 10000. - 2 stops.

I am not allowed to use count distinct with partition by order by. I want to do something like count(distinct(postal)) over (partition by user order by eventorder), but it is not allowed.

Does anyone know how to solve this? Thanks a lot!


Solution

  • Perhaps the simplest method is to use a subquery and count the "1"s:

    select t.*,
           sum(case when seqnum = 1 then 1 else 0 end) over (partition by usr order by eventorder) as num_postals
    from (select t.*,
                 row_number() over (partition by usr, postal order by eventorder) as seqnum
          from t
         ) t