sql-server t-sql distinct window-functions

Distinct Counts in a Window Function

I'm trying to get a distinct count of rows within a window function that has multiple levels of partitioning. Below is a sample of my data.

PRODUCT_ID	KEY_ID	STORECLUSTER
1000078	120	LLNY
1000078	202	LLF
1000078	202	LLNY
1000078	202	LLNY

I want to look at each PRODUCT_ID and then each unique KEY_ID and determine how many unique STORECLUSTERS there are per KEY_ID. For example PRODUCT_ID 1000078 has two unique KEY_ID's (120 and 202) of which 120 has 1 unique STORECLUSTER and 202 has 2 unqiue STORECLUSTER's. I've tried using a RANK() and DENSE_RANK() but I can't seem to get the partitioning correct. I would like to get a table that looks like this:

PRODUCT_ID	KEY_ID	STORECLUSTER	STORECLUSTER_COUNT
1000078	120	LLNY	1
1000078	202	LLF	2
1000078	202	LLNY	2
1000078	202	LLNY	2

Solution

Unfortunately, SQL Server does not support COUNT(DISTINCT as a window function.

So you need to nest window functions. I find the simplest and most efficient method is MAX over a DENSE_RANK, but there are others.

The partitioning clause is the equivalent of GROUP BY in a normal aggregate, then the value you are DISTINCTing goes in the ORDER BY of the DENSE_RANK. So you calculate a ranking, while ignoring tied results, then take the maximum rank, per partition.

SELECT
  PRODUCT_ID,
  KEY_ID,
  STORECLUSTER,
  STORECLUSTER_COUNT = MAX(rn) OVER (PARTITION BY PRODUCT_ID, KEY_ID)
FROM (
    SELECT *,
      rn = DENSE_RANK() OVER (PARTITION BY PRODUCT_ID, KEY_ID ORDER BY STORECLUSTER)
    FROM YourTable t
) t;

db<>fiddle