sql oracle indexing oracle11g sql-execution-plan

Statistics rapidly changing within transaction - fixing execution plan

A problem I'm facing (Oracle 11g):

I create a table, let's call it table_xyz, with index (not unique, no primary key). I create package with procedure that will insert let's say 10 millions ofrecords monthly- it's not a simple "insert into", it's thousand lines of procedural code and some of it actually also selects data from table_xyz to calculate what data to insert further.

For example somewhere within the procedure there is this query

Now, there is a problem: When the procedure is run for the first time, all queries on table_xyz will have execution plan based on the moment, when there were 0 records in table_xyz. So, all queries will effectively full scan table_xyz, instead of starting to use indexes at some point. This leads to terrible performance and in my case actually, the first run will never finish...

Now, there are three approaches i thought of:

At some point within the transaction, recalculate statistics. For example run "analyze table / analyze index compute statistics" after the count of records in table_xyz reaches the power of 10. This is not possible, however, since ANALYZE commits transaction and i cannot allow that
At some point within the transaction, recalculate statistics as above, but do it in autonomous transaction. This does not work however, since the new statistics will not be visible for the main transaction (i tested that).
Just hint all the cursors that use table_xyz with USE_INDEX. This gets the job done, but is ugly and generally frowned upon in the codebase.

Are there any other ways?

I attach some code. It is just an example, please do not try to optimize it by removing the procedure and so on.

create table table_xyz (idx number(10) /*+ Specifically this is NOT a primary key */
                       ,some_value varchar2(10)
                     );
create index table_xyz_idx on table_xyz (idx);

declare




   cursor idxes is
      select level idx
        from dual d
       connect by level < 100000;
   
   current_val varchar2(10);




   function calculate_some_value(p_idx number) return varchar2
   is
      cursor c_previous is
         select t.some_value
           from table_xyz t
          where t.idx in (round(p_idx / 2, 0), round(p_idx / 3, 0), round(p_idx / 5, 0))
          order by t.idx desc
        ;
      x varchar2(100);
   begin
      open c_previous;
      fetch c_previous into x;
      close c_previous;
      x := nvl(x, 'XYZ');
      if mod(p_idx, 2) = 0 then
         x := x || '2';
      elsif mod(p_idx, 3) = 0 then
         x := '3' || x;
      elsif mod(p_idx, 5) = 0 then
         x := substr(x, 1,1) || '5' || substr(x, 2, 2 + mod(p_idx, 7));
      end if;
      
      
      x := substr(x, 1, 10);
      return x;
   end calculate_some_value;


begin
   
   for idx in idxes
   loop
       current_val := calculate_some_value(idx.idx);
       insert into table_xyz(idx, some_value) values (idx.idx, current_val);
   
   end loop;


end;

Solution

Consider taking a look at the DBMS_STATS package.

Option A: use the DBMS_STATS procedures for manually setting table, column, and index statistics (i.e., SET_TABLE_STATS, SET_COLUMN_STATS, and SET_INDEX_STATS, respectively). Then use DBMS_STATS.LOCK_TABLE_STATS to keep your manually set statistics from being overwritten (e.g., by a DBA gathering schema statistics while your table happens to be empty).

Option B: run you procedure as is and then, after, manually gather stats on the table. Then, as above, use DBMS_STATS.LOCK_TABLE_STATS to keep them from being overwritten.

Either way, the idea is to set or gather statistics on your table and then lock them in place.

If you want to get fancier, maybe you could automate this. E.g.,

At install time, manually set the statistics and lock them for your first run
In your procedure code, at the end, unlock the statistics, gather them, and re-lock them.