Search code examples
javatomcatjbossload-balancinglighttpd

JBoss Clustering and Lighttpd Load Balancing displaying inconsistent behaviour


PROBLEM

We have two JBossAS 4.2.3 installed on separate machines and they are clustered. We also make use of Lighttpd that acts as load balancer, and is placed between our Tomcat servers (the Tomcat servers are not clustered) and the JBoss servers. Once all servers are up and running, the application runs flawlessly. If I bring down one JBoss server, the requests are redirected to the other server, as expected. My problem starts after I logout of the application. On trying to re-login into the application, I get an exception that says, Tomcat cannot connect to the server that was brought down.

SERVER SETUP

  1. Machine01 - Tomcat7
  2. Machine02 - Tomcat7
  3. Machine03 - JBoss 4.2.3
  4. Machine04 - JBoss 4.2.3
  5. Machine05 - Lighttpd 1.4.28

OTHER INFORMATION

  • All machines use Ubuntu 12.04 OS.
  • JBoss machines are clustered.
  • The EAR is dropped in the all/deploy folder.
  • The JBossAS are started using the following command - ./run.sh -b 0.0.0.0 -c all --partition=SomePartitionName &> /dev/null &.
  • Tomcat7 run as a service, hence they are started as sudo service tomcat7 start.
  • The Lighttpd is configured to work as a load balancer for the JBoss machines.
  • The following the mod_proxy configuration on lighttpd:

    server.modules += ( "mod_proxy" )
    proxy.balance = "fair"
    proxy.server = ( "" => (( "host" => "Machine03", "port" => 1100 ),
                           ( "host" => "Machine04", "port" => 1100 ))
    
  • The jndi.properties have the following entry

    java.naming.factory.initial=org.jnp.interfaces.NamingContextFactory
    java.naming.factory.url.pkgs=org.jboss.naming:org.jnp.interfaces
    java.naming.provider.url=Machine05:80
    

Unable to figure out why, after I bring down a machine and logout from the application, the Tomcats do not have a proxy-reference to the JBoss machines anymore.


Solution

  • I have got it solved now. Thanks to smallworld for pointing me to the right direction. What was happening was that we were caching the remote interfaces obtained from the jndi lookup. Now from what I understand, this remote interface points to only one particular server from the cluster. (We were thinking that the remote interface would be smart enough to identify that the server is down. Looks like that smartness is somewhere with the initial context when you do the lookup). So once the server is brought down, any ejb call made to that remote interface would land up connecting to the server that was brought down. So to solve it, we stopped caching the remote interface and do a lookup every time we need the services of that EJB. If the any server is down, the lookup would return the remote interface of the server that is up and running. With this the cluster works flawlessly! So, guys your code should look something like this:

        // Some where at class level we have the following map declared
        private static final Map remoteEJBHashMap = new HashMap(100, 0.9f);
    
    public static final <T> T getEJBInterface(String jndiLookupName) {
        String jndiName = jndiLookupMap.get(jndiLookupName);
        T ejbInterface = null;
        //T ejbInterface = (T) remoteEJBHashMap.get(jndiLookupName);
        //if (ejbInterface == null) {
            try {
                ejbInterface = (T) ctx.lookup(jndiName);
            } catch (NamingException e) {
                throw new RuntimeException(e);
            }
            //remoteEJBHashMap.put(jndiLookupName, ejbInterface);
        //}
        return ejbInterface;
    }
    

    The commented lines are the ones that caused the issue. Now the only thing left for me to research is a better solution to this if there is any.