Search code examples
oraclelistenerfailovertnsdataguard

Oracle : JDBC TNS URL Connection won't failover because listener still responding


I'm facing a stupid problem, but after some time searching online and experimenting, I'm starting to lose hope.
I've got two Oracle DBs which are configured to be replicas, thanks to Dataguard.

I'm using a JDBC TNS URL to connect to my DBs, such as :

jdbc:oracle:thin:@
  (DESCRIPTION_LIST=
    (FAILOVER=on)
    (LOAD_BALANCE=off)
    (DESCRIPTION=
      (ADDRESS=
        (PROTOCOL=TCP)
        (HOST=DB1) primary
        (PORT=1521))
        (CONNECT_DATA=
          (SERVER=DEDICATED)
          (SID=MySID))
        )
    (DESCRIPTION=
      (ADDRESS=
        (PROTOCOL=TCP)
        (HOST=DB2) secondary
        (PORT=1521))
        (CONNECT_DATA=
          (SERVER=DEDICATED)
          (SID=MySID))
        )
    )

When I perform a switchover, the roles are exchanged : DB1 becomes secondary and DB2 becomes primary. DB1 is in mount status.
So far, so good.

But with my connection URL, I'm expecting to get a connection from DB2, which became primary but as the DB1 listener is still up, it acts as if everything is OK and I end up with an attempt to get a connection on DB1, which leads to the following error :

ORA-01033: ORACLE initialization or shutdown in progress

If I kill the listener, then the failover works and I got a connection from DB2.

But the whole point of dataguard is to perform automatic failover.
But if I'm forced to kill the listener :

  1. It's not what I was expecting :)
  2. The switchback may not work, as it's using listeners to do so

If anyone has a clue as for a correct configuration, I'm interested !

Thanks in advance.


Solution

  • After long hours of trying to find a proper solution, I'm pretty sure this mechanism strongly relies on the listener : the failover mechanism works fine only if the listener is stopped.
    Knowing that, I finally decided to implement my own solution, without touching to the application code.

    As I can't play with original listeners, since Dataguard use them to perform its operations, I duplicated all the listeners. For instance, for LISTENER_DB1 on port 1521, I created LISTENER_DB1_FO (FO stands for FailOver as you might have guessed) on port 1531.

    My configuration from the application point of view then becomes :

    jdbc:oracle:thin:@
      (DESCRIPTION_LIST=
        (FAILOVER=on)
        (LOAD_BALANCE=off)
        (DESCRIPTION=
          (ADDRESS=
            (PROTOCOL=TCP)
            (HOST=DB1) primary
            (PORT=1531))
            (CONNECT_DATA=
              (SERVER=DEDICATED)
              (SID=MySID))
            )
        (DESCRIPTION=
          (ADDRESS=
            (PROTOCOL=TCP)
            (HOST=DB2) secondary
            (PORT=1531))
            (CONNECT_DATA=
              (SERVER=DEDICATED)
              (SID=MySID))
            )
        )
    

    Thanks to a colleague who helped me a little on that, I wrote a script which checks if the database role is primary or not (works even if the DB is in mount state). And from that answer, my script will start or stop the associated listener.

    #! /bin/bash export ORACLE_HOME=<YOUR_HOME> export ORACLE_BIN=$ORACLE_HOME/bin/ DATABASE_ROLE() { export ORACLE_SID=$1 request='SELECT DATABASE_ROLE FROM V$DATABASE' result=`$ORACLE_BIN/sqlplus -silent / as sysdba << EOF set pages 0 feedback off
    ${request}; exit EOF` echo ${result} } for DBNAME in DB1 DB2 DB3 do $ORACLE_BIN/lsnrctl status LISTENER_${DBNAME}_FO > /dev/null return_status=$? if [ "$(DATABASE_ROLE ${DBNAME})" != 'PRIMARY' ];then echo "DB ${DBNAME} is secondary" if [ $return_status -eq 0 ];then $ORACLE_BIN/lsnrctl stop LISTENER_${DBNAME}_FO fi else echo "DB ${DBNAME} is primary" if [ $return_status -eq 1 ];then $ORACLE_BIN/lsnrctl start LISTENER_${DBNAME}_FO fi fi done

    Then I cronned that script. The only "drawback" is that the minimum interval between two cron executions is one minute. Your FailOver detection may then take 59 seconds to be detected, if you're unlucky.

    But we've been testing it for several days and it works like a charm.

    If anyone has the correct solution or a better idea, don't hesitate ! Thanks.