Search code examples
javaspring-cloudnetflix-eurekanetflix-zuul

Zuul: automatic rerouting incoming requests to other service instance in case of unavailable service


I have configured Zuul with Eureka in a way, that 3 identical instances of a service are working parallely. I am calling the gateway on the port 8400, which routes incoming requests to ports 8420, 8430 and 8440 in a round-robin manner. It works smoothly. Now, if I switching off one of the 3 services, a small amount of incoming requests will go wrong with the following exception:

com.netflix.zuul.exception.ZuulException: Filter threw Exception
    => 1: java.util.concurrent.FutureTask.report(FutureTask.java:122)
    => 3: hu.perit.spvitamin.core.batchprocessing.BatchProcessor.process(BatchProcessor.java:106)
    caused by: com.netflix.zuul.exception.ZuulException: Filter threw Exception
    => 1: com.netflix.zuul.FilterProcessor.processZuulFilter(FilterProcessor.java:227)
    caused by: org.springframework.cloud.netflix.zuul.util.ZuulRuntimeException: com.netflix.zuul.exception.ZuulException: Forwarding error
    => 1: org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.run(RibbonRoutingFilter.java:124)
    caused by: com.netflix.zuul.exception.ZuulException: Forwarding error
    => 1: org.springframework.cloud.netflix.zuul.filters.route.RibbonRoutingFilter.handleException(RibbonRoutingFilter.java:198)
    caused by: com.netflix.client.ClientException: com.netflix.client.ClientException
    => 1: com.netflix.client.AbstractLoadBalancerAwareClient.executeWithLoadBalancer(AbstractLoadBalancerAwareClient.java:118)
    caused by: java.lang.RuntimeException: org.apache.http.NoHttpResponseException: scalable-service-2:8430 failed to respond
    => 1: rx.exceptions.Exceptions.propagate(Exceptions.java:57)
    caused by: org.apache.http.NoHttpResponseException: scalable-service-2:8430 failed to respond
    => 1: org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)

My Zuul routing looks like this:

### Zuul routes
zuul.routes.scalable-service.path=/scalable/**
#Authorization header will be forwarded to scalable-service
zuul.routes.scalable-service.sensitiveHeaders: Cookie,Set-Cookie
zuul.routes.scalable-service.serviceId=template-scalable-service

It takes a while until Eureka discovers the service is not available any more.

My question is: Is there a possibility, to configure Zuul so that in case of a NoHttpResponseException, it forwards the requests to another available instance in the pool?


Solution

  • Finally I found the solution to the problem. The appropriate search phrase was 'fault tolerance'. The key is the autoretry config in the following application.properties file. The value of template-scalable-service.ribbon.MaxAutoRetriesNextServer must be set at least to 6 in case of 3 pooled services to achieve full fault tolerance. With that setup I can kill 2 of 3 services any time, no incoming request will go wrong. Finally I have set it to 10, there is no unnecessary increase of timeout, hystrix will break the line.

    ### Eureka config
    eureka.instance.hostname=${hostname:localhost}
    eureka.instance.instanceId=${eureka.instance.hostname}:${spring.application.name}:${server.port}
    eureka.instance.non-secure-port-enabled=false
    eureka.instance.secure-port-enabled=true
    eureka.instance.secure-port=${server.port}
    eureka.instance.lease-renewal-interval-in-seconds=5
    eureka.instance.lease-expiration-duration-in-seconds=10
    
    eureka.datacenter=perit.hu
    eureka.environment=${EUREKA_ENVIRONMENT_PROFILE:dev}
    eureka.client.serviceUrl.defaultZone=${EUREKA_SERVER:https://${server.fqdn}:${server.port}/eureka}
    eureka.client.server.waitTimeInMsWhenSyncEmpty=0
    eureka.client.registry-fetch-interval-seconds=5
    eureka.dashboard.path=/gui
    
    eureka.server.enable-self-preservation=false
    eureka.server.expected-client-renewal-interval-seconds=10
    eureka.server.eviction-interval-timer-in-ms=2000
    
    ### Ribbon
    ribbon.IsSecure=true
    ribbon.NFLoadBalancerPingInterval=5
    ribbon.ConnectTimeout=30000
    ribbon.ReadTimeout=120000
    
    ### Zuul config
    zuul.host.connectTimeoutMillis=30000
    zuul.host.socketTimeoutMillis=120000
    zuul.host.maxTotalConnections=2000
    zuul.host.maxPerRouteConnections=200
    zuul.retryable=true
    
    ### Zuul routes
    #template-scalable-service
    zuul.routes.scalable-service.path=/scalable/**
    #Authorization header will be forwarded to scalable-service
    zuul.routes.scalable-service.sensitiveHeaders=Cookie,Set-Cookie
    zuul.routes.scalable-service.serviceId=template-scalable-service
    # Autoretry config for template-scalable-service
    template-scalable-service.ribbon.MaxAutoRetries=0
    template-scalable-service.ribbon.MaxAutoRetriesNextServer=10
    template-scalable-service.ribbon.OkToRetryOnAllOperations=true
    
    #template-auth-service
    zuul.routes.auth-service.path=/auth/**
    #Authorization header will be forwarded to scalable-service
    zuul.routes.auth-service.sensitiveHeaders=Cookie,Set-Cookie
    zuul.routes.auth-service.serviceId=template-auth-service
    # Autoretry config for template-auth-service
    template-auth-service.ribbon.MaxAutoRetries=0
    template-auth-service.ribbon.MaxAutoRetriesNextServer=0
    template-auth-service.ribbon.OkToRetryOnAllOperations=false
    
    ### Hystrix
    hystrix.command.default.execution.timeout.enabled=false
    

    Beside of this, I have a profile specific setup in application-discovery.properties

    #Microservice environment
    eureka.client.registerWithEureka=false
    eureka.client.fetchRegistry=true
    spring.cloud.loadbalancer.ribbon.enabled=true
    

    I start my server in a docker container like this:

    services:
        discovery:
            container_name: discovery
            image: template-eureka
            environment:
                #agentlib for remote debugging
                - JAVA_OPTS=-DEUREKA_SERVER=https://discovery:8400/eureka -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
                - TEMPLATE_EUREKA_OPTS=-Dspring.profiles.active=default,dev,discovery
                - EUREKA_ENVIRONMENT_PROFILE=dev
            ports:
                - '8400:8400'
                - '5500:5005'
    
            networks: 
                - back-tier-net
                - monitoring
            hostname: 'discovery'
    

    See the complete solution in GitHub.