Search code examples
oracle-databaseoracle11g

Same SQL_ID using different size of temp tablespace


Our database suffer with this error a few days ago

ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

We found a sql statement will consume a lot of temp tablespace(>20GB), but it was normal earlier(<10MB).

  1. Used GATHER_TABLE_STATS() on those table in the sql have query. Didn't help.
  2. After gathering table statistics many hours later, the issue disappeared.
  3. The same binding variable which use large temp tablespace previously, now can be normally execute and consume a little temp space.
  4. The sql_id have multiple records in v$sql.

Maybe something wrong in particular plan? How Can I identify it?


Solution

  • To diagnose historical temporary tablespace problems, you'll need to use AWR (active workload repository) tables.

    First, check that your system retention period covers the date the problem happened. The default configuration is 8 days.

    select * from dba_hist_wr_control;
    

    Next, assuming that your problem falls within your retention period, check for the SQL statements that consumed a large amount of temporary tablespace around the time the error occurred. Note that it's possible that the query you care about did not cause the problem. If another query consumed all the temporary tablespace, your query would also fail even if it only needed a small amount.

    select username, event, sql_id, plan_hash_value, temp_space_allocated/1024/1024/1024 gb,
        dba_hist_active_sess_history.*
    from dba_hist_active_sess_history
    join dba_users on dba_hist_active_sess_history.user_id = dba_users.user_id
    where sample_time >= sysdate - 3
        --Find queries that used more than 1 GB of temporary tablespace.
        and temp_space_allocated > 1024*1024*1024*1
    order by sample_time desc;
    

    If your query is indeed the culprit, check for execution plan changes, which is the main reason why query performance would radically change. The same SQL_ID likely generates multiple plan hash values that will display in the below query.

    select * from table(dbms_xplan.display_awr(sql_id => 'SQL_ID from above'));
    

    Compare the before and after execution plans. Pay special attention to the "Note" section of the execution plan. One popular reason for plans to change is adaptive reoptimization features. The optimizer learns from mistakes, and can adjust plans based on previous runs. For example, you might see "Cardinality feedback" on the second execution plan, but not the first. If that's the case, then Oracle fixed the problem and you likely don't need to worry about it again. (Unless the good execution plan gets forgotten, and Oracle tries the bad plan again. In that case, you might need to use a plan control feature to force the good plan.)

    The rows returned by dba_hist_active_sess_history can also tell you which operation consumed the most time and resources, which tells you what part of the execution plan went wrong. That data can be difficult to read because you're creating like a historical version of a SQL Monitor Report. It's much easier to diagnose these problems when they're happening, because you can just run something like select dbms_sqltune.report_sql_monitor(sql_id => 'Your SQL_ID') from dual;

    This troubleshooting process can take a long time, and I've only covered a few possible reasons. My answer is already making some guesses, and if I'm totally wrong, edit the original question with the debugging data.