Search code examples
sqlsql-serverssmsssms-16

A Column to count number of distinct values in column X based on column Y?


In SSMS 2016, I have a select statement with various joins which gives me the following data:

| box_barcode | order_number | order_shipment_id | item     | qty |
|-------------|--------------|-------------------|----------|-----|
|  3330000001 | 0000105      | FP001             | tshirt-S | 1   |
|  3330000001 | 0000105      | FP001             | tshirt-M | 2   |
|  3330000001 | 0000105      | FP001             | tshirt-L | 2   |
|  3330000005 | 0000108      | FP002             | shorts-S | 2   |
|  3330000005 | 0000108      | FP002             | shorts-M | 1   |
|  3330000005 | 0000120      | FP002             | shorts-S | 1   |
|  3330000010 | 0000120      | FP003             | shirts-M | 2   |
|  3330000010 | 0000120      | FP003             | shirts-L | 2   |
|  3330000010 | 0000121      | FP003             | shirts-S | 3   |
|  3330000010 | 0000121      | FP003             | shirts-M | 3   |
|  3330000010 | 0000122      | FP003             | shirts-S | 2   |

I'd like to add a column to count the number of distinct order_numbers for each box_barcode, for the desired result:

| box_barcode | order_number | order_shipment_id | item     | qty | count |
|-------------|--------------|-------------------|----------|-----|-------|
|  3330000001 | 0000105      | FP001             | tshirt-S | 1   | 1
|  3330000001 | 0000105      | FP001             | tshirt-M | 2   | 1
|  3330000001 | 0000105      | FP001             | tshirt-L | 2   | 1
|  3330000005 | 0000108      | FP002             | shorts-S | 2   | 2
|  3330000005 | 0000108      | FP002             | shorts-M | 1   | 2
|  3330000005 | 0000120      | FP002             | shorts-S | 1   | 2
|  3330000010 | 0000120      | FP003             | shirts-M | 2   | 3
|  3330000010 | 0000120      | FP003             | shirts-L | 2   | 3
|  3330000010 | 0000121      | FP003             | shirts-S | 3   | 3
|  3330000010 | 0000121      | FP003             | shirts-M | 3   | 3
|  3330000010 | 0000122      | FP003             | shirts-S | 2   | 3

I'm struggling to find out how best to achieve this. I know of count(distinct..), but would I have to put my current query into a subquery for the count to go against the results of that query first?


Solution

  • One more option with dense_rank and max.

    select t.*,
           max(rnk) over(partition by box_barcode) as distinct_count
    from (select t.*,
          dense_rank() over(partition by box_barcode order by order_numbers) as rnk
          from t
         ) t
    

    The highest ranked row (using dense_rank) will be the distinct number of order numbers per box_barcode.