Search code examples
oracleconnection-poolingspring-jdbcoracle-aqora-01000

Debugging ORA-01000


I'm altering a legacy Spring application to make use of dynamically configured connection pools. All queries in the entire system are using JdbcTemplate, or else are connecting through Jms to Oracle AQ -- I have checked and every single time a ResultSet is obtained it is closed in the framework code inside of a finally{} block. Prior to me moving over to the dynamically configured connection pools, the app never had any troubles. I am currently using C3P0 as my connection pool, but the same problems show up using DBCP2 and Atomikos pools as well. Here's the issue:

After processing for any length of time (it seems to vary based on what the app is doing but it always happens eventually) I run out of connections. Out of desperation I have boosted the connection count to 10k but that just prolongs the inevitable. I run this query:

select a.value, s.username, s.sid, s.serial# from v$sesstat a, v$statname b, v$session s
where a.statistic# = b.statistic#  and s.sid=a.sid and b.name = 'opened cursors current'
    and s.username = '<my app>' order by a.value asc ;

Sure enough,

 VALUE USERNAME                              SID    SERIAL#
------ ------------------------------ ---------- ----------
     1 MY_APP                                 69      16665
     2 MY_APP                                149       9703
     3 MY_APP                                 13      38401
     4 MY_APP                                100       8629
     4 MY_APP                                145      26291
    29 MY_APP                                 49      30425
  2997 MY_APP                                147      33539
  5317 MY_APP                                 52      12599
  6425 MY_APP                                102      14803
 10000 MY_APP                                 19      18469

Okay, I already knew I was running out of cursors. I know the next piece of advice I see on the web to find out what the runaway resource leak is, is to run something like the following query:

select sql_text, sid, count(*) from v$open_cursor where user_name = '<my app>'
group by sql_text, sid having count(*) > 1 ;

I get back a grand total of nine rows ... there's no count on these guys greater than three. If I remove the "having" clause and total up all the cursor counts, it amounts to about 120 or so cursors. Not 10k!

select sql_text, count(*) from v$open_cursor where sid = 19 group by sql_text having count(*) > 1 ;

SQL_TEXT                                                       COUNT(*)
--------------------------------------------------------     ----------
update seq$ set increment$=:2,minvalue=:3,maxvalue=:4,cycle#         41

This is the app implicitly getting a new sequence value from an insert statement that is being closed correctly (and also implicitly) by JdbcTemplate. Even so, 41 is not in the same range as 10k!

I know that the v$open_cursor table represents only "cached" cursors, and that some of the cursors counted in the "opened cursors current" statistic might be already marked as "closeable". Is there any way to figure out what exact process or sql is involved in these open cursors?

I have tried several approaches to trying to figure out what is going on through the application. I created a special datasource wrapper that would wrap all connections, statements, prepared statements, callable statements, and all their produced result sets, kept them in special collections in memory and removed them on close. I thought I would be able to catch it this way -- on the contrary, every single resource opened through this pool was closed. Every last one. I even debugged down inside the guts of the connection pool and spring engine to watch the Oracle thin ForwardOnlyResultSet get marked as closed.

I can almost suspect that the problem is in the interaction with AQ, except that this mechanism worked quite well in the older version of the app.

I've even tried working around the problem by explicitly catching the ORA-01000 error and using that as a trigger to hard-reset my connection pool. It works so long as it's working on simple tasks, but for certain large processes that are making thousands of database operations, I'm out of connections before the process finishes; hard-resetting the connection pool invalidates the transaction, and the whole thing goes into an infinite loop. Bad situation, all around.

I'm totally at a loss ... From the application side it looks completely water-tight, and from the db side, it doesn't look like I have a lot of open cursors by looking in v$open_cursor, only in the statistic. Unfortunately the statistic is the one that seems to matter.

Help?


Solution

  • Your first SQL shows that session 19 has 10,000 open cursors , and your 3rd SQL shows that only one of them is shared. Conclusion, session 19 has nearly 10,000 unique open cursors. The usual cause of this is using non-shareable SQL. ie, you're not using any bind variables in your code. That means that every time your app runs a SQL statement, the database opens a new cursor for that statement. When your app has run 10000 statements, you hit the wall.

    To work around this, you can set the database parameter CURSOR_SHARING=FORCE . That will cause the database to automatically substitute bind variables for literals in your SQL, making your SQL statements shareable. Then you won't have a separate cursor opened each time your app runs a SQL statement.

    Here's a short article on CURSOR_SHARING=FORCE: http://www.dba-oracle.com/t_cursor_sharing_force.htm

    If you're using Oracle 9i or above, you can use CURSOR_SHARING=SIMILAR instead. (Although some DBAs recommend not using SIMILAR until Oracle 11 or above, because bugs)

    Setting CURSOR_SHARING should be considered a quick fix, not a final solution. In the long term, you should rewrite your app so that it uses bind variables instead of using CURSOR_SHARING to compensate. Here's a nice explanation of why: https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5180609822543

    Finally - setting CURSOR_SHARING should be used with care and awareness that you might not get the optimzer plan you expected.