Search code examples
sqldatabasepostgresqlgreatest-n-per-groupwindow-functions

How to get distinct using `OVER(PARTITION BY ...)`


How to get distinct using OVER(PARTITION BY ...)?

SELECT ACCT_TYPE_CD AS entity,
         ACCT_ID AS entity_VAL,
         COUNT (1) AS CNT
    FROM ACCOUNT
GROUP BY ACCT_TYPE_CD, ACCT_ID
  HAVING COUNT (1) > 1;

entity      |entity_val      |cnt |
------------|----------------|----|
EMPID       |1               |3   |
EMPID       |1066            |3   |

The same I want using below query, so that I should be able to use this as inner query.

select  ACCT_TYPE_CD, ACCT_ID,  COUNT(*) OVER(PARTITION BY ACCT_TYPE_CD, ACCT_ID) AS CNT
from ACCOUNT ACCT; 

acct_type_cd |ACCT_ID     |cnt |
-------------|------------|----|
EMPID        |1           |3   |
EMPID        |1           |3   |
EMPID        |1           |3   |
EMPID        |1066        |3   |
EMPID        |1066        |3   |
EMPID        |1066        |3   |

Solution

  • Is this what you want?

    select *
    from (
        select  acct_type_cd, acct_id, count(*) over(partition by acct_type_cd, acct_id) as cnt
        from account acct
    ) t
    where cnt > 1
    

    That is: you can't use window functions in the where clause of a query - you need to compute the window count in a subquery, then filter in the outer query.

    Alternatively, if you want to generate the same result as the first query using window functions only, then you might be looking for row_number():

    select *
    from (
        select  
            acct_type_cd, 
            acct_id, 
            count(*) over(partition by acct_type_cd, acct_id) as cnt,
            row_number() over(partition by acct_type, acct_id order by acct_type, acct_id) rn
        from account acct
    ) t
    where cnt > 1 and rn = 1
    

    That said, I don't really why you would want to do that; using aggregation (as in the first query) is probably always more efficient than two window functions and filtering (as in the second query).

    Here is one last option using distinct on:

    select distinct on (acct_type_cd, acct_id)
        acct_type_cd, 
        acct_id, 
        count(*) over(partition by acct_type_cd, acct_id) as cnt
    from account acct
    order by acct_type_cd, acct_id