I have an order line table that looks like this:
ID | Order ID | Product Reference | Variant |
---|---|---|---|
1 | 1 | Banana | Green |
2 | 1 | Banana | Yellow |
3 | 2 | Apple | Green |
4 | 2 | Banana | Brown |
5 | 3 | Apple | Red |
6 | 3 | Apple | Yellow |
7 | 4 | Apple | Yellow |
8 | 4 | Banana | Green |
9 | 4 | Banana | Yellow |
10 | 4 | Pear | Green |
11 | 4 | Pear | Green |
12 | 4 | Pear | Green |
I want to know how often people place an order with a combination of different fruit products. I want to know the orderId
for that situation and which productReference
was combined in the orders.
I only care about the product, not the variant.
I would imagine the desired output looking like this - a simple table output that gives insight in what product combos are ordered:
Order ID | Product |
---|---|
2 | Banana |
2 | Apple |
4 | Banana |
4 | Apple |
4 | Pear |
I just need data output of the combination Banana+Apple
and Banana+Apple+Pear
happening so I can get more insight in the frequency of how often this happens. We expect most of our customers to only order Apple
, Banana
or Pear
products, but that assumption needs to be verified.
Problem
I kind of get stuck after the first step.
select orderId, productReference, count(*) as amount
from OrderLines
group by orderId, productReference
This outputs:
Order ID | Product Reference | amount |
---|---|---|
1 | Banana | 2 |
2 | Apple | 1 |
2 | Banana | 1 |
3 | Apple | 2 |
4 | Apple | 1 |
4 | Banana | 2 |
4 | Pear | 3 |
I just don't know how to move on from this step to get the data I want.
You can use a window count() over()
select *
from
(
select orderId, productReference, count(*) as amount
, count(productReference) over(partition by orderId) np
from OrderLines
group by orderId, productReference
) t
where np > 1