I've got two tables with a many to one relationship which I'll call Parent_Table and Child_Table (i.e. a parent has zero or more children, but children have exactly one parent). I need to count the number of parents who have at least one child that fulfills some condition. Which query is optimal?
Option 1 (pretty sure it's not this one)
SELECT COUNT(DISTINCT(pt.ID))
FROM PARENT_TABLE pt
JOIN CHILD_TABLE ct
ON pt.ID = ct.PARENT_ID
WHERE <parent meets some condition>
AND <child meets some condition>
Option 2
SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE pt.ID in
(
SELECT ct.PARENT_ID
FROM CHILD_TABLE ct
WHERE <child meets condition>
)
AND <parent meets some condition>
Option 3 (my guess as the fastest)
SELECT COUNT(pt.ID)
FROM PARENT_TABLE pt
WHERE EXISTS
(
SELECT 1
FROM CHILD_TABLE ct
WHERE ct.PARENT_ID = pt.ID
AND <child meets condition>
)
AND <parent meets some condition>
Or is it something else entirely? Does it depend on the sizes of each table, or the complexity of the two conditions, or whether the data is sorted?
EDIT: Database is Oracle.
The first query is slow, the others should run fast on most DB's.
Without knowing the DB it's hard to say more:
But: count(*) is often faster than count(names_field) and never slower
count(distinct (afield)) is slow
Or is it something else entirely?
That depends on the DB and the exact version of the DB.
Does it depend on the sizes of each table
Yes, that plays a big part
or the complexity of the two conditions
Possible
or whether the data is sorted?
If you want a fast select, all fields used to join must be indexed.
And all fields used in a where clause must either be indexed or low-cardinality.