Search code examples
sqloracle-databasejoinquery-performance

ORACLE join multiple tables performance


I have kinda complex question. Let's say that I have 7 tables (20mil+ rows each) (Table1, Table2 ...) with corresponding pk (pk1, pk2, ....) (cardinality among all tables is 1:1) I want to get my final table (using hash join) as:

Create table final_table as select   
t1.column1,  
t2.column2,  
t3.column3,  
t4.column4,  
t5.column5,  
t6.column6,  
t7.column7  
from table1 t1  
join table2 t2 on t1.pk1 = t2.pk2  
join table2 t3 on t1.pk1 = t3.pk3   
join table2 t4 on t1.pk1 = t4.pk4     
join table2 t5 on t1.pk1 = t5.pk5  
join table2 t6 on t1.pk1 = t6.pk6  
join table2 t7 on t1.pk1 = t7.pk7

I would like to know if it would be faster to create partial tables and then final table, like this?

Create table partial_table1 as select   
t1.column1,  
t2.column2  
from table1 t1  
join table2 t2 on t1.pk1 = t2.pk2



create table partial_table2 as select   
t1.column1, t1.column2  
t3.column3  
from partial_table1 t1  
join table3 t3 on t1.pk1 = t3.pk3



create table partial_table3 as select  
t1.column1, t1.column2, t1.column3   
t4.column4   
from partial_table1 t1   
join table3 t4 on t1.pk1 = t4.pk4



...  
...  
... 

I know it depends on RAM (because I want to use hash join), actual server usage, etc.. I am not looking for specific answer, I am looking for some explanations why and in what situations would it be better to use partial results or why it would it be better to use all 7 joins in 1 select.
Thanks, I hope that my question is easy to understand.


Solution

  • In general, it is not better to create temporary tables. SQL engines have an optimization phase and this optimization phase should do well as figuring out the best query plan.

    In the case of a bunch of joins, this is mostly about join order, use of indexes, and the optimal algorithm.

    This is a good default attitude. Does it mean that temporary tables are never useful for performance optimization? Not at all. Here are some exceptions:

    • The optimizer generates a suboptimal query plan. In this case, query hints can push the optimizer in the right direction. And, temporary tables can help.
    • Indexing the temporary tables. Sometimes an index on the temporary tables can be a big win for performance. The optimizer might not pick this up.
    • Re-use of temporary tables across queries.

    For your particular goal of using hash joins, you can use a query hint to ensure that the optimizer does what you would like. I should note that if the joins are on primary keys, then a hash join might not be the optimal algorithm.