Search code examples
sqloracle-databaseperformancequery-performancesqlperformance

Why is this Oracle query so slow?


Short story to explain how I got into this query performance mess:
I've been handed an old Oracle database to fix and maintain.

The data model was designed very badly, with no foresight.
The database is in terrible shape, and performs extremely slowly.
And of course, this database hasn't been tuned or modified in over 6 years ...

I'm starting with one of the many "snapshot" tables in this database.
(Rather than tracking history properly, the original developer just scheduled snapshots of records to be copied and stored in additional tables all over the place. So you have to query these snapshot tables to get historical analytics.)

This table has ~100 columns, but we don't care about most of them.
However, when I began work on this a couple days ago, this table had ONLY ONE INDEX: a NON-UNIQUE index of the ID column, with no primary key constraint at all.

This is a snapshot table, meaning it contains historical copies of rows, from different points in time.
For example, row #12345 on 9/20/19, and again the same row #12345 but as it was on 9/30/19
So, alas, the ID column must allow duplicate values.

So I figured, step one: create a compound primary key constraint on ID and snapshot_date together, to create a proper unique identifier.

The analytics query I'm trying to build is a frankenstein of many different pre-existing queries already being used in this application.
It looks like garbage, because that's what I have to work with ...

select efh.snapshot_date,
       max(efhp.snapshot_date) as previous_snapshot_date,
       substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1) as edge_vp,
       substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1) as edge_rm,
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_FORECAST, 0) else 0 end) as forecast,
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_BEST,0) else 0 end) as best,
       sum(case when efh.oppty_status = 'Won' then NVL(efh.ARR,0) else 0 end) as closed,
       sum(case when efh.oppty_status = 'Open' then nvl(efh.ARR_PIPELINE,0) else 0 end) as pipeline,
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_BEST,0) else 0 end) +
       sum(case when efh.oppty_status = 'Open' then nvl(efh.ARR_PIPELINE,0) else 0 end) as pipe_best,
       sum(case when efh.oppty_status = 'Won' then efh.ARR else 0 end) +
       sum(case when efh.oppty_status = 'Open' then NVL(efh.ARR_FORECAST,0) else 0 end) as closed_forecast

  from edge_forecast_hist efh
  left join edge_forecast_hist efhp on efhp.edge_vp = efh.edge_vp and efhp.edge_rm = efh.edge_rm and efhp.snapshot_date < efh.snapshot_date
 where efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
   and efh.edge_asm != 'REDACTED'
   and efh.oppty_status in ('Open', 'Won')

group by efh.snapshot_date,
         substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1),
         substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1)
order by 1, 2, 3, 4

At the beginning, this query took ~5 minutes to execute.

Click on image to enlarge
enter image description here

After I created a primary key, as well as an index on every single column being referenced in this query, the execution time dropped to ~4 minutes, which is definitely an improvement, but not as good as I expected.
(This query only returns a couple hundred rows.)

When I tried to explain this query, I noticed that only a few of the indexes are actually being used.
(See three checkmarks indicating which indexes are being used, in the screenshot above.)

And there is some troubling language in the execution plan,
such as NESTED LOOPS and TABLE ACCESS (table scan??)

Click on image to enlarge
enter image description here

This is a terrible database, and I'm new to Oracle, and I don't totally understand the nuances of everything in the execution plan.
What appears to be the bottleneck here, and how might I mitigate it?

Some ideas that come to mind:

  • Self join (this shouldn't be a huge issue, any modern database should be able to handle this easily)
  • Inline case statements per column, in the select list, inside aggregate functions (SUM)

If it would really help, I can do several hours of work, trying to split those case statements out.
(i.e. computing results in multiple subqueries, with the case conditions moved into the where clauses.)
But I also assumed Oracle was smart enough to find the most efficient execution plan, without needing to be spoon-fed.
In other words, I don't want to do all that work, just to end up with the exact same execution plan.
I need to know what the actual problem is, so I can produce a targeted solution that properly addresses the root-cause.


Solution

  • This has some similarity as @Gordon Linoff but uses the additional analytical function of Rows Unbounded Preceding to create a cumulative sum which is what your query appears to do:

    SELECT snapshot_date, previous_snapshot_date, edge_vp, edge_rm
    , forecast, best, closed, pipeline
    , best + pipeline AS pipe_best
    , closed + forecast AS closed_forecast
    
    FROM
    (
      SELECT efh.snapshot_date
      , LAG(efh.snapshot_date) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date) as previous_snapshot_date
      , substr(efh.edge_vp,1,instr(efh.edge_vp,'@oracle.com')-1) as edge_vp
      , substr(efh.edge_rm,1,instr(efh.edge_rm,'@oracle.com')-1) as edge_rm
      , SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_FORECAST END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as forecast
      , SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_BEST END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as best
      , SUM(CASE WHEN efh.oppty_status = 'Won' THEN efh.ARR END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as closed
      , SUM(CASE WHEN efh.oppty_status = 'Open' THEN efh.ARR_PIPELINE END) OVER (PARTITION BY efh.edge_vp, efh.edge_rm ORDER BY snapshot_date ROWS UNBOUNDED PRECEDING) as pipeline
    
      FROM edge_forecast_hist efh
    
      WHERE efh.snapshot_date >= TRUNC(sysdate) - INTERVAL '70' DAY
       AND efh.edge_asm != 'REDACTED'
       AND efh.oppty_status in ('Open', 'Won')
    )