I'm having a slow start to the morning. I thought there was a more efficient way to make the following query using a join, instead of two independent selects -- am I wrong?
Keep in mind that I've simplified/reduced my query into this example for SO purposes, so let me know if you have any questions as well.
SELECT DISTINCT c.*
FROM customers c
WHERE c.customer_id IN (select customer_id from customers_cars where car_make = 'BMW')
AND c.customer_id IN (select customer_id from customers_cars where car_make = 'Ford')
;
-- Simple tables to demonstrate point
CREATE TABLE customers (
customer_id serial,
name text
);
CREATE TABLE customers_cars (
customer_id integer,
car_make text
);
-- Populate tables
INSERT INTO customers(name) VALUES
('Joe Dirt'),
('Penny Price'),
('Wooten Nagen'),
('Captain Planet')
;
INSERT INTO customers_cars(customer_id,car_make) VALUES
(1,'BMW'),
(1,'Merc'),
(1,'Ford'),
(2,'BMW'),
(2,'BMW'), -- Notice car_make is not unique
(2,'Ferrari'),
(2,'Porche'),
(3,'BMW'),
(3,'Ford');
-- ids 1 and 3 both have BMW and Ford
You don't need to join to customers
at all (given relational integrity).
Generally, this is a case of relational division. We assembled an arsenal of techniques under this related question:
If (customer_id, car_make)
was defined unique in customers_cars
, it would get much simpler:
SELECT customer_id
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1
HAVING count(*) = 2;
Since (customer_id, car_make)
is not unique, we need an extra step.
For only a few cars, your original query is not that bad. But (especially with duplicates!) EXISTS
is typically faster than IN
, and we don't need the final DISTINCT
:
SELECT customer_id -- no DISTINCT needed.
FROM customers c
WHERE EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'BMW')
AND EXISTS (SELECT 1 FROM customers_cars WHERE customer_id = c.customer_id AND car_make = 'Ford');
Above query gets verbose and less efficient for a longer list of cars. For an arbitrary number of cars I suggest:
SELECT customer_id
FROM (
SELECT customer_id, car_make
FROM customers_cars
WHERE car_make IN ('BMW', 'Ford')
GROUP BY 1, 2
) sub
GROUP BY 1
HAVING count(*) = 2;