SELECT DISTINCT options.id, options.foo_option_id, options.description
FROM vehicles
INNER JOIN vehicle_options ON vehicle_options.vehicle_id = vehicles.id
INNER JOIN options ON options.id = vehicle_options.option_id
INNER JOIN discounted_vehicles ON vehicles.id = discounted_vehicles.vehicle_id
WHERE discounted_vehicles.discount_id = 4;
The query above returns me 2067 rows, and it runs locally in 1.7 seconds. I'm wondering if it is as fast as it can be or if I can tweak it further somehow, as this dataset will grow fast overtime.
Things that I've tried without change in speed:
1 - Change the join order, joining from the smallest to the biggest table.
2 - Adding an index to discounted_vehicles.discount_id.
The best query depends on missing information.
This should be substantially faster in a typical setup:
SELECT id, foo_option_id, description
FROM options o
WHERE EXISTS (
SELECT
FROM discounted_vehicles d
JOIN vehicle_options vo USING (vehicle_id)
WHERE d.discount_id = 4
AND vo.option_id = o.id
);
Assuming referential integrity, enforced by FK constraints, we can omit the table vehicle
from the query and join from discounted_vehicles
to vehicle_options
directly.
Also, EXISTS
is typically faster if there are many qualifying rows per distinct option.
Ideally, you'd have multicolumn indexes on:
discounted_vehicles(discount_id, vehicle_id)
vehicle_options(vehicle_id, option_id)
Index columns in this order. You probably have a PK constraint on vehicle_options
providing the 2nd index, but the column order should match. Related:
Depending on actual data distribution, there may be faster query styles. Related:
Changing the join order is typically futile. Postgres reorders joins any way it expects to be fastest. (Exceptions apply.) Related: