Search code examples
sqlperformanceoracleoracle11gsql-execution-plan

Large amount of projected I/O with Oracle, even if only a single record is fetched


I often encounter the following situation in my Oracle execution plans:

Operation                   | Object  | Order | Rows | Bytes | Projection
----------------------------+---------+-------+------+-------+-------------
TABLE ACCESS BY INDEX ROWID | PROD    |     7 |   2M |   28M | PROD.VALUE
  INDEX UNIQUE SCAN         | PROD_PK |     6 |   1  |       | PROD.ROWID

This is an extract from a larger execution plan. Essentially, I'm accessing (joining) a table using the table's primary key. Typically, there is another table ACCO with ACCO.PROD_ID = PROD.ID, where PROD_PK is the primary key on PROD.ID. Obviously, the table can be accessed using a UNIQUE SCAN, but as soon as I have some silly projection on that table, it seems as though the whole table (around 2 million rows) is planned to be read in memory. I get a lot of I/O and buffer gets. When I remove the projection from the greater query, the problem disappears:

Operation                   | Object  | Order | Rows | Bytes | Projection
----------------------------+---------+-------+------+-------+-------------
TABLE ACCESS BY INDEX ROWID | PROD    |     7 |   1  |     8 | PROD.ID
  INDEX UNIQUE SCAN         | PROD_PK |     6 |   1  |       | PROD.ROWID

I don't understand this behaviour. What could be the reasons for this? Note, I cannot post the complete query. It is rather complex and involves a lot of calculations. The pattern, however, is often the same.

UPDATE: I maganged to bring down my rather complex setup to a simple simulation that produces a similar execution plan in both cases (when projecting PROD.VALUE or when leaving it away):

Create the following database:

-- products have a value
create table prod as
select level as id, 10 as value from dual 
connect by level < 100000;
alter table prod add constraint prod_pk primary key (id);

-- some products are accounts
create table acco as
select level as id, level as prod_id from dual 
connect by level < 50000;
alter table acco 
  add constraint acco_pk primary key (id);
alter table acco 
  add constraint acco_prod_fk foreign key (prod_id) references prod (id);

-- accounts have transactions with values
create table trxs as
select level as id, mod(level, 10) + 1 as acco_id, mod(level, 17) + 1 as value
from dual connect by level < 100000;
alter table trxs 
  add constraint trxs_pk primary key (id);
alter table trxs 
  add constraint trxs_acco_fk foreign key (acco_id) references acco (id);

create index acco_i on acco(prod_id);
create index trxs_i on trxs(acco_id);

alter table acco modify prod_id not null;
alter table trxs modify acco_id not null;

Run the following query

select v2.*
from (
  select 
    -- This calculates the balance for every transaction as a
    -- running total, subtracting trxs.value from the product's value
    --
    -- This is the "projection" I mentioned that causes I/O. Leaving it
    -- away (setting it to 0), would improve the execution plan
    prod.value - v1.total balance,
    acco.id acco_id
  from (
    select 
      acco_id,
      sum(value) over (partition by acco_id
                       order by id
                       rows between unbounded preceding 
                       and current row) total
    from trxs
  ) v1
  join acco on v1.acco_id = acco.id
  join prod on acco.prod_id = prod.id
) v2
-- This is the single-row access predicate. From here, it is
-- clear that there can only be 1 acco and 1 prod
where v2.acco_id = 1;

Analyse

When analysing execution plans for the above query (with or without any prod.value projection), I can reproduce an excessive amount of rows / bytes in the plan when accessing the prod table.

I have found a workaround for this issue. But I'm really interested in an explanation about what is going wrong and how I could correct this problem without changing the query too much

Update

OK, after much more analysis, I have to say that the actual problematic I/O was due to a wrong index being used somewhere entirely else. Unfortunately, this was not well-enough projected in overall statistics (or in the execution plan) to notice.

As far as this question goes, I'm still curious about the projected I/O in the execution plan, as that appears to confuse our DBA (and me) time and again. And sometimes, it really is the source of I/O problems...


Solution

  • It might be interesting to note that I have checked up on various scenarios, including a specific solution for the specific example. Re-phrase the sample query to be like this would solve the problem in this case:

    select
      -- Explicitly project value in a nested loop. This seems to be much cheaper
      -- in this specific case
      (select value from prod where id = v2.prod_id) - v2.balance,
      v2.acco_id
    from (
      select 
        -- Now, balance is only a running total, not the running total
        -- added to PROD.VALUE
        v1.total balance,
        acco.id acco_id,
        acco.prod_id prod_id
      from (
        select 
          acco_id,
          sum(value) over (partition by acco_id
                           order by id
                           rows between unbounded preceding 
                           and current row) total
        from trxs
      ) v1
      -- The JOIN of PROD is no longer needed
      join acco on v1.acco_id = acco.id
    ) v2
    where v2.acco_id = 1;
    

    But I still don't understand why Oracle would project so much I/O in its execution plan, if I join prod earlier in this query...