Search code examples
oracle-databaseoracle19c

Gather stats on Large table


We have very large table with almost 80 Billion records with one hash paritition. any suggestions how to gather stats on this table as it is taking a week to get completed or failed in between with snapshot too old error.

I am using below parameter to gather stats:

BEGIN  dbms_stats.gather_table_stats(ownname => ABC'  tabname =>'table1',  estimate_percent => 
DBMS_STATS.AUTO_SAMPLE_SIZE, 
cascade => DBMS_STATS.AUTO_CASCADE,  method_opt => 'FOR ALL COLUMNS SIZE AUTO',  degree => 
DBMS_STATS.DEFAULT_DEGREE); 
END;

Oracle version: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.17.0.0.0


Solution

  • There are various speed tweaks available to you: (1) lower the estimate percent, (2) explicitly use parallelism, (3) request block sampling, (4) bypass indexes, and (5) disallow histograms by setting SIZE 1 on all columns:

    BEGIN  
      dbms_stats.gather_table_stats(ownname => ABC'  
                                    tabname =>'TABLE1',  
                                    estimate_percent => 0.1, 
                                    block_sample => true,
                                    cascade => false,  
                                    method_opt => 'FOR ALL COLUMNS SIZE 1',  
                                    degree => 16); 
    END;
    

    Obviously you have to consult your own database hardware sizing to determine an appropriate degree of parallelism. Some systems can handle more, others less.

    I have found that on massive tables a percent well below 1%, something like 0.1% works just fine, and goes 10x faster than 1, which goes 100x faster than the default (100). Block sampling can lead to some incorrect results if your data is skewed badly and you use a tiny percent (like 0.001%), but in many cases it's fine at a 0.1% percent level or higher, and it does speed things up by randomly picking blocks rather than rows, which reduces the # of blocks that need to be read. Lastly, histograms are expensive, and often they are unneeded. If you find out later that you do need histograms on a particular column to support a particular important application SQL that is getting messed up because of the lack a histogram, then you can explicitly request SIZE AUTO on just that column, but keep all the others at 1. Histograms take a long time to gather, so they are best minimized to on an as-needed basis.