Search code examples
springload-balancingnetflix-zuulnetflix-ribbon

Loadbalancing fails when a server is down


I have written a simple set of micro-services with the following architecture: enter image description here

For all, I have added spring-boot-starter-actuator in order to add /health endpoint.

In Zuul/Ribbon configuration I have added :

zuul:
  ignoredServices: "*"
  routes:
    home-service:
      path: /service/**
      serviceId: home-service
      retryable: true

home-service:
  ribbon:
    listOfServers: localhost:8080,localhost:8081
    eureka.enabled: false
    ServerListRefreshInterval: 1

So that, each time client will call GET http://localhost:7070/service/home, loadbalancer will choose one of two HomeService which runs on 8080 or 8081 port and will call its endpoint /home.

But, when one of HomeService is shutdown, the loadbalancer does not seem to be aware (in spite of ServerListRefreshInterval configuration) and will fail with error=500 if it tries to call the shutdown instance.

How could I fix it?


Solution

  • I have received and tested a solution from spring-cloud team.

    Solution is here in github

    To summarize:

    • I have added org.springframework.retry.spring-retry to my zuul classpath
    • I have added @EnableRetry to my zuul application
    • I have put the following properties in my zuul configuration

    application.yml

    server:
      port: ${PORT:7070}
    
    spring:
      application:
        name: gateway
    
    endpoints:
      health:
        enabled: true
        sensitive: true
      restart:
        enabled: true
      shutdown:
        enabled: true
    
    zuul:
      ignoredServices: "*"
      routes:
        home-service:
          path: /service/**
          serviceId: home-service
          retryable: true
      retryable: true
    
    home-service:
      ribbon:
        listOfServers: localhost:8080,localhost:8081
        eureka.enabled: false
        ServerListRefreshInterval: 100
        retryableStatusCodes: 500
        MaxAutoRetries: 2
        MaxAutoRetriesNextServer: 1
        OkToRetryOnAllOperations: true
        ReadTimeout: 10000
        ConnectTimeout: 10000
        EnablePrimeConnections: true
    
    ribbon:
      eureka:
        enabled: false
    
    hystrix:
      command:
        default:
          execution:
            isolation:
              thread:
                timeoutInMilliseconds: 30000