Search code examples
performanceoraclewebsphere

What is causing spikes for JDBC calls to Oracle from within Websphere?


I was wondering whether someone can shed some light on the following issue:

We've been seeing spikes for JDBC calls from within a Spring 2.5.6 based web service run on Websphere 6.1 on AIX for calls into Oracle 64-bit 10.2.0.5.0 The JDBC driver version is 10.2.0.3.0.

We're hitting the database with a single thread, the average response time is for the web service is 16ms, but we're seeing 11 spikes of about 1 seconds or higher (amongst about 11,000 calls in 5 minutes). Introscope is telling us that about half these spikes are caused by "select 1 from dual" (which the Websphere connection pool uses to validate the connection).

On the database side, we've traced the sessions created by the Websphere connection pool, and none that does not indicate any spikes inside the database.

Any ideas/suggestions on what could be causing these spikes?

EDIT:

Our connection pool is set up with 20 connections, and monitoring is showing that only one connection is used.

EDIT2:

We've upgraded our Oracle JDBC driver to 10.2.0.5 with no difference.


Solution

  • The answer to this problem ended up not being related to WebSphere or Oracle but was a good old fashioned network configuration problem which resulted in TCP retransmission timeouts between the WebSphere server and the Oracle RAC cluster.

    In order to arrive at that diagnostic I was looking at the output of netstat -p tcp before and after a test run and found that the

    retransmit timeouts
    

    stat was increasing. Now the Retransmission Timeout Algorithm configuration can be viewed using:

    $ no -a
    ...
                     rto_high = 64
                   rto_length = 13
                    rto_limit = 7
                      rto_low = 1
    

    Which indicates that the retransmission timeouts will take between 1 and 64 seconds and will back-off increasingly, which explains why we've been seeing spikes of 1 second, 2 seconds, 4 second, 10 seconds and 22 seconds but nothing away from these peaks (i.e. no 6 second spike).

    Once the network config was fixed, the problem went away.