Search code examples
pivotsnowflake-cloud-data-platform

Using Count Distinct with Pivot in Snowflake


I am trying to do a pivoting on column Join_mon and get aggregate count for each ID , as shown in following query;

select *
from CTE3
pivot(COUNT(DISTINCT platform_payer_name) for Join_Mon in (
 '2021-03-01',
 '2021-02-01',
  '2021-01-01',
 '2020-12-01'

        ))
  as p
order by ID
)

As you can see I am trying to get distinct count her for the column platform_payer_name. But it is giving the following error;

SQL compilation error: syntax error line 48 at position 16 unexpected 'DISTINCT'

I am quite positive DISTINCT works with COUNT in snowflake. Can I get some help why it is failing here. Help is appreciated.


Solution

  • so making some fake data that maps to your pivot, albeit I dropped the excessive paren

    with cte3(id, platform_payer_name, Join_Mon) as (
        select * from values
            (1,'aa', '2021-03-01'),
            (1,'aa', '2021-03-01'),
            (1,'aa', '2021-03-01'),
            (1,'aa', '2021-02-01'),
            (2,'bb', '2012-03-01'),
            (2,'cc', '2020-12-01')
    )
    select *
    from CTE3 AS c
    pivot(COUNT(c.platform_payer_name) for c.Join_Mon in (
             '2021-03-01',
             '2021-02-01',
             '2021-01-01',
             '2020-12-01' )
    ) as p
    order by id;
    

    gives:

    ID  '2021-03-01'    '2021-02-01'    '2021-01-01'    '2020-12-01'
    1   3               1               0               0
    2   0               0               0               1
    

    so makes sense you want distinct in there

    but it seems it's not supported..

    so while it's somewhat cut'n'paste error prone, it does "work":

    with cte3(id, platform_payer_name, Join_Mon) as (
        select * from values
            (1,'aa', '2021-03-01'),
            (1,'aa', '2021-03-01'),
            (1,'aa', '2021-03-01'),
            (1,'aa', '2021-02-01'),
            (2,'bb', '2012-03-01'),
            (2,'cc', '2020-12-01')
    )
    select id
        ,count(distinct(iff(Join_Mon='2021-03-01',platform_payer_name,null))) as "2021-03-01"
        ,count(distinct(iff(Join_Mon='2021-02-01',platform_payer_name,null))) as "2021-02-01"
        ,count(distinct(iff(Join_Mon='2021-01-01',platform_payer_name,null))) as "2021-01-01"
        ,count(distinct(iff(Join_Mon='2020-12-01',platform_payer_name,null))) as "2020-12-01"
    from CTE3 AS c
    group by 1 order by 1;
    

    gives:

    ID  2021-03-01  2021-02-01  2021-01-01  2020-12-01
    1   1           1           0           0
    2   0           0           0           1
    

    which works because pivot is doing two task, the first is moving values into columns if matching the input, thus that is the same as:

    with cte3(id, platform_payer_name, Join_Mon) as (
    select * from values
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-03-01'),
        (1,'aa', '2021-02-01'),
        (2,'bb', '2012-03-01'),
        (2,'cc', '2020-12-01')
    )
    select id
        ,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
        ,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
        ,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
        ,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
    from CTE3 AS c
    order by 1;
    

    which gives:

    ID, 2021-03-01, 2021-02-01, 2021-01-01, 2020-12-01
    1,  aa,         NULL,       NULL,       NULL
    1,  aa,         NULL,       NULL,       NULL
    1,  aa,         NULL,       NULL,       NULL
    1,  NULL,       aa,         NULL,       NULL
    2,  NULL,       NULL,       NULL,       NULL
    2,  NULL,       NULL,       NULL,       cc
    

    which can then have a count(distinct x) ran over each column.

    select id
        ,count(distinct("2021-03-01")) as "2021-03-01"
        ,count(distinct("2021-02-01")) as "2021-02-01"
        ,count(distinct("2021-01-01")) as "2021-01-01"
        ,count(distinct("2020-12-01")) as "2020-12-01"
    from (
        select id
            ,iff(Join_Mon='2021-03-01',platform_payer_name,null) as "2021-03-01"
            ,iff(Join_Mon='2021-02-01',platform_payer_name,null) as "2021-02-01"
            ,iff(Join_Mon='2021-01-01',platform_payer_name,null) as "2021-01-01"
            ,iff(Join_Mon='2020-12-01',platform_payer_name,null) as "2020-12-01"
        from CTE3 AS c
    )
    group by id
    order by id;
    

    or can be done inline as I showed in the first answer.