I have the following example data structure of customer that can be part of multiple groups using a junction table and data:
CREATE TABLE customer(id) AS VALUES (0),(1),(2),(3);
CREATE TABLE groups(id) AS VALUES (1),(3),(5),(6);
CREATE TABLE customers_to_groups(customer_id, group_id) AS
VALUES (0, 1)--customer 0 is in group (5 OR 6) AND (1 OR 3)
,(0, 5)--customer 0 is in group (5 OR 6) AND (1 OR 3)
,(1, 1)
,(1, 90)
,(2, 1)
,(3, 3)--customer 3 is in group (5 OR 6) AND (1 OR 3)
,(3, 5)--customer 3 is in group (5 OR 6) AND (1 OR 3)
,(3, 90);
I need to get customers that have specific groups they are part of, and I need to get a list of all customers that are part of at least 1 group in multiple lists of group. For example I want to get all customers that are in group (5 OR 6) AND (1 OR 3)
, so for example a customer in group 5 and 1 wold be a returned, but somebody in group 1 and 90 or just group 1 not. With the provided sample data we would get the customer of id 0 and 3 only as they conform to the given rules above.
Just doing WHERE group_id IN (5,6) AND group_id IN (1,3)
does not seem to work, so I am looking for alternative.
I got this so far that works:
SELECT DISTINCT c.id
FROM customer c
INNER JOIN customers_to_groups at1 ON c.id = at1.customer_id
INNER JOIN customers_to_groups at2 ON c.id = at2.customer_id
WHERE at1.group_id IN (5, 6)
AND at2.group_id IN (1, 3);
Expected Results:
id |
---|
0 |
3 |
Is there a way to do it that is more performant?
We can GROUP BY
customer's id and use a HAVING
clause. There we can use CASE
or FILTER
if your RDBMS supports it. Postgres should.
There your conditions will be set.
The query will be:
SELECT c.id
FROM customer c
INNER JOIN customers_to_groups ctg
ON c.id = ctg.customer_id
AND ctg.group_id IN (1,3,5,6)
GROUP BY c.id
HAVING
COUNT(CASE WHEN ctg.group_id IN (1,3) THEN 1 END) > 0
AND COUNT(CASE WHEN ctg.group_id IN (5,6) THEN 1 END) > 0;
or
SELECT c.id
FROM customer c
INNER JOIN customers_to_groups ctg
ON c.id = ctg.customer_id
AND ctg.group_id IN (1,3,5,6)
GROUP BY c.id
HAVING
COUNT(*) FILTER(WHERE ctg.group_id IN (1,3)) > 0
AND COUNT(*) FILTER(WHERE ctg.group_id IN (5,6)) > 0;
Note: Above queries assume you really need to include both customers
and customers_to_groups
table and join them. If you don't need to include the customers
table, just remove it and select from the table customers_to_groups
only to improve the performance:
SELECT customer_id
FROM customers_to_groups ctg
WHERE ctg.group_id IN (1,3,5,6)
GROUP BY customer_id
HAVING
COUNT(*) FILTER(WHERE ctg.group_id IN (1,3)) > 0
AND COUNT(*) FILTER(WHERE ctg.group_id IN (5,6)) > 0;
This demo with your sample data and a lot of additional rows shows the performance differences.
Using FILTER
or CASE
is far faster (as it avoids a second JOIN
on the table customers_to_groups). The exact performance difference depends on the real data in your tables and which index(es) you use.