sql oracle-database query-performance sql-execution-plan

How many rows does the DB System process when you create external references

Lets say i have the following tables: t1(20 rows), t2(5), t3(80), t4(20) and t5(10)

Select a+b from t1, t2
where a=b
and c in
(
    select d from t3, t4, t5
    where d=e and f=g and h>10 and t1.a = d
);

As you can see we have several filters and products and an external reference. The parent query would be processed 100(20*5) times before filtering, and the subquery would be processed 16000(20*80*10) times, since it has an external reference, it will be processed once for every row on the parent query, so it will be processed 1.600.000 times.

the question is, does this entire query get processed 1.600.000 times or 1.600.000+100 times(plus the parent processes).

if you are wondering, this is about an ambigous question asked in class that wasn't properly answered before an exam.

Solution

I'm not sure what the answer is but it's much more complicated than a single number. This question has two incorrect premises: Oracle does not necessarily execute queries in the order they are written, and the joins almost certainly do not have an algorithmic complexity of O(M * N).

Query Execution Order

With the information given it's impossible to tell how Oracle will run that query. Oracle, and probably most other databases, do not have to run statements in the order they are written.

The Oracle optimize will frequently transform the query to run it more efficiently. With view merging Oracle may flatten the query into a bunch of regular joins and run them out of order. Or possibly Oracle can completely remove some tables from the execution with join elimination.

Generate an explain plan to see how a query is running.

Complexity

See this question for examples of analysis of join complexity. We don't know exactly how Oracle's algorithms work, and I'm not an expert in algorithms, but I think it's safe to say that databases will only rarely process O(M * N) rows for a join.

Maybe it uses an index nested loop, O(N * log(M)), or a hash join, O(N + M), or a sort merge, O(N*log(N) + M*log(M)), or a sort merge using an already-sorted full index scan, O(N*log(N) + M), etc.

And those time complexities are probably missing a lot of important details. For example, if it's a hash join, does it require multiple passes? And in practice, things like single-block IO versus multiple-block IO can become more important than the complexity.

New Approach

You should either approach this problem more pragmatically, by using real-life Oracle execution plans, or more theoretically, by using algorithmic time complexity. Don't expect to arrive at an answer that has a single, convenient number.