Search code examples
sql-servert-sqldistinctwindow-functions

Distinct Counts in a Window Function


I'm trying to get a distinct count of rows within a window function that has multiple levels of partitioning. Below is a sample of my data.

PRODUCT_ID KEY_ID STORECLUSTER
1000078 120 LLNY
1000078 202 LLF
1000078 202 LLNY
1000078 202 LLNY

I want to look at each PRODUCT_ID and then each unique KEY_ID and determine how many unique STORECLUSTERS there are per KEY_ID. For example PRODUCT_ID 1000078 has two unique KEY_ID's (120 and 202) of which 120 has 1 unique STORECLUSTER and 202 has 2 unqiue STORECLUSTER's. I've tried using a RANK() and DENSE_RANK() but I can't seem to get the partitioning correct. I would like to get a table that looks like this:

PRODUCT_ID KEY_ID STORECLUSTER STORECLUSTER_COUNT
1000078 120 LLNY 1
1000078 202 LLF 2
1000078 202 LLNY 2
1000078 202 LLNY 2

Solution

  • Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function.

    So you need to nest window functions. I find the simplest and most efficient method is MAX over a DENSE_RANK, but there are others.

    The partitioning clause is the equivalent of GROUP BY in a normal aggregate, then the value you are DISTINCTing goes in the ORDER BY of the DENSE_RANK. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.

    SELECT
      PRODUCT_ID,
      KEY_ID,
      STORECLUSTER,
      STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
    FROM (
        SELECT *,
          rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
        FROM YourTable t
    ) t;
    

    db<>fiddle