Search code examples
sqloraclequery-optimization

How to optimize the SQL query to the Oracle database


I have an SQL query to the Oracle database (The query links the elements to the clusters related by the clusters by having the maximum average connection to each element of the cluster. The relationship of two elements is defined by number of links they have in common. We calculate the relation of the given element to the each element of the cluster, then take the relation of the element to the cluster as average of relations of this elements to all elements of the cluster, then pick the cluster to which our element has maximal relation and add the element to the cluster )

INSERT INTO t_clusters (transaction_id,
                    CLUSTER_ID,
                    word_id,
                    connection)
SELECT 72,
      CLUSTER_ID,
      word_id,
      avg_cnt
 FROM (  SELECT word_id, CLUSTER_ID, AVG (cnt) avg_cnt
           FROM (  SELECT a.word_id,
                          b.word_id bword,
                          c.CLUSTER_ID,
                          COUNT (*) cnt
                     FROM t_semantic a,
                          t_words a1,
                          t_semantic b,
                          t_clusters c
                    WHERE     a.transaction_id = 72
                          AND a.word_id = a1.word_id
                          AND a1.checked = 0
                          AND CLUSTER_ID <> 0
                          AND a.link_id = b.link_id
                          AND b.word_id = c.word_id
                 GROUP BY a.word_id, b.word_id, c.CLUSTER_ID)
       GROUP BY word_id, CLUSTER_ID
         HAVING AVG (cnt) >= 3) zz
WHERE avg_cnt = (SELECT MAX (avg_cnt)
                   FROM (  SELECT AVG (cnt) avg_cnt, word_id
                             FROM (  SELECT a.word_id,
                                            b.word_id bword,
                                            c.CLUSTER_ID,
                                            COUNT (*) cnt
                                       FROM t_semantic a,
                                            t_words a1,
                                            t_semantic b,
                                            t_clusters c
                                      WHERE     a.transaction_id = 72
                                            AND a.word_id = a1.word_id
                                            AND a1.checked = 0
                                            AND CLUSTER_ID <> 0
                                            AND a.link_id = B.link_id
                                            AND b.word_id = c.word_id
                                   GROUP BY a.word_id,
                                            b.word_id,
                                            c.CLUSTER_ID)
                         GROUP BY word_id, CLUSTER_ID
                           HAVING AVG (cnt) > 3) zzy
                  WHERE zzy.word_id = zz.word_id)

It executing already 23 hours

The number of records in the table of relations (t_semantic) is 532125 The number of records in the table of clusters (t_clusters) where stored elements, already connected to clusters is 23014 The number of records in the table with elements not connected to clusters yet (t_words) is 36111

Fields word_id, cluster_id, link_id indexed in all table. Is it any way to improve the perfomance of this query and why it's so slow?

Plan
INSERT STATEMENT  ALL_ROWSCost: 9,234                                                   
    28 FILTER                                               
        10 HASH GROUP BY  Cost: 9,234  Bytes: 46,000  Cardinality: 2,000                                            
            9 VIEW  Cost: 9,234  Bytes: 8,599,493  Cardinality: 373,891                                         
                8 SORT GROUP BY  Cost: 9,234  Bytes: 15,329,531  Cardinality: 373,891                                   
                    7 HASH JOIN  Cost: 5,215  Bytes: 15,329,531  Cardinality: 373,891                               
                        1 TABLE ACCESS FULL TABLE DKIM.T_WORDS Cost: 86  Bytes: 286,568  Cardinality: 35,821                            
                        6 HASH JOIN  Cost: 5,117  Bytes: 20,274,408  Cardinality: 614,376                           
                            4 HASH JOIN  Cost: 2,245  Bytes: 4,108,540  Cardinality: 205,427                        
                                2 INDEX FAST FULL SCAN INDEX (UNIQUE) DKIM.UK_CLUSTER Cost: 333  Bytes: 229,140  Cardinality: 22,914                    
                                3 TABLE ACCESS FULL TABLE DKIM.T_SEMANTIC Cost: 1,903  Bytes: 5,277,070  Cardinality: 527,707                   
                            5 TABLE ACCESS FULL TABLE DKIM.T_SEMANTIC Cost: 1,909  Bytes: 6,859,632  Cardinality: 527,664                       
        27 SORT AGGREGATE  Bytes: 26  Cardinality: 1                                            
            26 VIEW  Cost: 18  Bytes: 26  Cardinality: 1                                        
                25 FILTER                                   
                    24 SORT GROUP BY  Cost: 18  Bytes: 23  Cardinality: 1                               
                        23 VIEW  Cost: 18  Bytes: 23  Cardinality: 1                            
                            22 SORT GROUP BY  Cost: 18  Bytes: 41  Cardinality: 1                       
                                21 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_CLUSTERS Cost: 4  Bytes: 10  Cardinality: 1                     
                                    20 NESTED LOOPS  Cost: 17  Bytes: 41  Cardinality: 1                
                                        18 NESTED LOOPS  Cost: 9  Bytes: 93  Cardinality: 3             
                                            15 NESTED LOOPS  Cost: 5  Bytes: 21  Cardinality: 1         
                                                12 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_WORDS Cost: 2  Bytes: 8  Cardinality: 1     
                                                    11 INDEX UNIQUE SCAN INDEX (UNIQUE) DKIM.T_WORDS_PK Cost: 1  Cardinality: 1  
                                                14 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_SEMANTIC Cost: 3  Bytes: 13  Cardinality: 1     
                                                    13 INDEX RANGE SCAN INDEX DKIM.IND_SEMANTIC_WORD_ID Cost: 2  Cardinality: 1  
                                            17 TABLE ACCESS BY INDEX ROWID TABLE DKIM.T_SEMANTIC Cost: 4  Bytes: 30  Cardinality: 3         
                                                16 INDEX RANGE SCAN INDEX DKIM.IND_SEMANTIC_LINK_ID Cost: 2  Cardinality: 3     
                                        19 INDEX RANGE SCAN INDEX DKIM.CLUSTERS_WORD_ID Cost: 2  Cardinality: 1     

Solution

  • Using analytic functions to determine the max avg_cnt might help speed things up, e.g.:

    insert into t_clusters (transaction_id,
                            cluster_id,
                            word_id,
                            connection)
    select 72,
           cluster_id,
           word_id,
           avg_cnt
     from  (select   word_id,
                     cluster_id,
                     avg(cnt) avg_cnt,
                     max(avg(cnt)) over (partition by word_id) max_avg_cnt
            from     (select   a.word_id,
                               b.word_id bword,
                               c.cluster_id,
                               count (*) cnt
                      from     t_semantic a,
                               t_words a1,
                               t_semantic b,
                               t_clusters c
                      where    a.transaction_id = 72
                      and      a.word_id = a1.word_id
                      and      a1.checked = 0
                      and      cluster_id <> 0
                      and      a.link_id = b.link_id
                      and      b.word_id = c.word_id
                      group by a.word_id, b.word_id, c.cluster_id)
            group by word_id, cluster_id
            having   avg(cnt) >= 3)
    where  avg_cnt = max_avg_cnt;