spring load-balancing netflix-zuul netflix-ribbon

Loadbalancing fails when a server is down

I have written a simple set of micro-services with the following architecture:

For all, I have added spring-boot-starter-actuator in order to add /health endpoint.

In Zuul/Ribbon configuration I have added :

zuul:
  ignoredServices: "*"
  routes:
    home-service:
      path: /service/**
      serviceId: home-service
      retryable: true

home-service:
  ribbon:
    listOfServers: localhost:8080,localhost:8081
    eureka.enabled: false
    ServerListRefreshInterval: 1

So that, each time client will call GET http://localhost:7070/service/home, loadbalancer will choose one of two HomeService which runs on 8080 or 8081 port and will call its endpoint /home.

But, when one of HomeService is shutdown, the loadbalancer does not seem to be aware (in spite of ServerListRefreshInterval configuration) and will fail with error=500 if it tries to call the shutdown instance.

How could I fix it?

Solution

I have received and tested a solution from spring-cloud team.

Solution is here in github

To summarize:

I have added org.springframework.retry.spring-retry to my zuul classpath
I have added @EnableRetry to my zuul application
I have put the following properties in my zuul configuration

application.yml

server:
  port: ${PORT:7070}

spring:
  application:
    name: gateway

endpoints:
  health:
    enabled: true
    sensitive: true
  restart:
    enabled: true
  shutdown:
    enabled: true

zuul:
  ignoredServices: "*"
  routes:
    home-service:
      path: /service/**
      serviceId: home-service
      retryable: true
  retryable: true

home-service:
  ribbon:
    listOfServers: localhost:8080,localhost:8081
    eureka.enabled: false
    ServerListRefreshInterval: 100
    retryableStatusCodes: 500
    MaxAutoRetries: 2
    MaxAutoRetriesNextServer: 1
    OkToRetryOnAllOperations: true
    ReadTimeout: 10000
    ConnectTimeout: 10000
    EnablePrimeConnections: true

ribbon:
  eureka:
    enabled: false

hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 30000