Search code examples
sqlsql-servergroup-by

Select rows where the combination of two columns is unique and we only display rows where the first column is not unique


I have an order line table that looks like this:

ID Order ID Product Reference Variant
1 1 Banana Green
2 1 Banana Yellow
3 2 Apple Green
4 2 Banana Brown
5 3 Apple Red
6 3 Apple Yellow
7 4 Apple Yellow
8 4 Banana Green
9 4 Banana Yellow
10 4 Pear Green
11 4 Pear Green
12 4 Pear Green

I want to know how often people place an order with a combination of different fruit products. I want to know the orderId for that situation and which productReference was combined in the orders.

I only care about the product, not the variant.

I would imagine the desired output looking like this - a simple table output that gives insight in what product combos are ordered:

Order ID Product
2 Banana
2 Apple
4 Banana
4 Apple
4 Pear

I just need data output of the combination Banana+Apple and Banana+Apple+Pear happening so I can get more insight in the frequency of how often this happens. We expect most of our customers to only order Apple, Banana or Pear products, but that assumption needs to be verified.

Problem

I kind of get stuck after the first step.

select orderId, productReference, count(*) as amount
from OrderLines
group by orderId, productReference

This outputs:

Order ID Product Reference amount
1 Banana 2
2 Apple 1
2 Banana 1
3 Apple 2
4 Apple 1
4 Banana 2
4 Pear 3

I just don't know how to move on from this step to get the data I want.


Solution

  • You can use a window count() over()

    select * 
    from
    (   
        select orderId, productReference, count(*) as amount
           , count(productReference) over(partition by orderId) np
        from OrderLines
        group by orderId, productReference    
    ) t
    where np > 1