Search code examples
netflix-zuulspring-cloud-netflixspring-retrynetflix-ribbon

Ribbon Retry properties not respected


I have a zuul gateway application, that receives requests from a client app and forwards the requests using load balanced rest template to a micro service with 2 endpoints, say endpoint1 and endpoint2 (load balancing between the two end points in round robbin which is okay for now, although I want it to be availability based).

Here are the issues I am facing -

  • I brought down one of the end points, say endpoint2 and tried calling the zuul route and I see that when the request is going to the endpoint2 - zuul takes 2 mins or so before failing with HTTP 503 and does not retry on the next request. the error is just cascaded back to the caller.
  • Also, even after setting the read time out and connect timeout configurations, I don't see ribbon respecting the configuration and still takes 2 mins to throw the error from the server.
  • I tried enabling logs at the netflix package level, but I am unable to see logs unless I pass a custom http client to rest template.

I am new to netflix stack of components... please advise if I am missing something obvious. Thanks

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.mycomp</groupId>
    <artifactId>zuul-gateway</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>zuul-gateway</name>
    <description>Spring Boot Zuul</description>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.5.9.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
    </parent>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <java.version>1.8</java.version>
        <spring-cloud.version>Edgware.SR1</spring-cloud.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-zuul</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-eureka</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-netflix-ribbon</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.amazonaws</groupId>
            <artifactId>aws-java-sdk-lambda</artifactId>
            <version>1.11.242</version>
        </dependency>
        <dependency>
            <groupId>org.codehaus.groovy</groupId>
            <artifactId>groovy-all</artifactId>
            <version>2.3.10</version>
        </dependency>
        <dependency>
            <groupId>com.netflix.netflix-commons</groupId>
            <artifactId>netflix-commons-util</artifactId>
            <version>0.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.retry</groupId>
            <artifactId>spring-retry</artifactId>
        </dependency>
    </dependencies>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>${spring-cloud.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
</project>

and my application.yml looks like below -

eureka:
client:
    healthcheck:
      enabled: true
    lease:
      duration: 5
    service-url:
      defaultZone: http://localhost:8761/eureka/

ingestWithOutEureka:
  ribbon:
    eureka:
      enabled: false
    NIWSServerListClassName: com.netflix.loadbalancer.ConfigurationBasedServerList
    listOfServers: http://demo-nlb-6a67d59c901ecd128.elb.us-west-2.amazonaws.com,http://demo-nlb-124321w2a123ecd128.elb.us-west-2.amazonaws.com
    okToRetryOnAllOperations: true
    ConnectTimeout: 500
    ReadTimeout: 1000
    MaxAutoRetries: 5
    MaxAutoRetriesNextServer: 5
    MaxTotalHttpConnections: 500
    MaxConnectionsPerHost: 100
    retryableStatusCodes: 404,503
    okhttp:
      enabled: true

zuul:
  debug:
    request: true
    parameter: true
  ignored-services: '*'
  routes:
    ingestServiceELB:
      path: /ingestWithoutEureka/ingest/**
      retryable: true
      url: http://dummyURL

management.security.enabled : false

spring:
  application:
    name: zuul-gateway
  cloud:
    loadbalancer:
      retry:
        enabled: true

logging:
  level:
    org:
      apache:
        http: DEBUG
    com:
      netflix: DEBUG

hystrix:
  command:
    default:
      execution:
        isolation:
          strategy: THREAD
          thread:
            timeoutInMilliseconds: 60000

and my application class looks like below

@SpringBootApplication
@EnableZuulProxy
@EnableDiscoveryClient
public class ZuulGatewayApplication {

    @Bean
    public InterceptionFilter addInterceptionFilter() {
        return new InterceptionFilter();
    }

    public static void main(String[] args) {
        SpringApplication.run(ZuulGatewayApplication.class, args);
    }
}

and lastly my zuul filter looks like below - package com.zuulgateway.filter;

import com.netflix.zuul.ZuulFilter;
import com.netflix.zuul.context.RequestContext;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.cloud.client.discovery.DiscoveryClient;
import org.springframework.cloud.client.loadbalancer.LoadBalanced;
import org.springframework.cloud.client.loadbalancer.LoadBalancerClient;
import org.springframework.web.client.RestTemplate;

import javax.servlet.http.HttpServletRequest;
import java.io.IOException;
import java.util.stream.Collectors;

public class InterceptionFilter extends ZuulFilter{
    private static final String REQUEST_PATH = "/ingestWithoutEureka";

    @LoadBalanced
    @Bean
    RestTemplate loadBalanced() {
        //RestTemplate restTemplate = new RestTemplate(new HttpComponentsClientHttpRequestFactory());
        RestTemplate restTemplate = new RestTemplate();
        return restTemplate;
    }

    @Autowired
    @LoadBalanced
    private RestTemplate loadBalancedRestTemplate;

    @Override
    public String filterType() {
        return "route";
    }

    @Override
    public int filterOrder() {
        return 0;
    }

    @Override
    public boolean shouldFilter() {
        RequestContext context = RequestContext.getCurrentContext();
        HttpServletRequest request = context.getRequest();
        String method = request.getMethod();
        String requestURI = request.getRequestURI();
        return requestURI.startsWith(REQUEST_PATH);
    }

    @Override
    public Object run() {

        RequestContext ctx = RequestContext.getCurrentContext();
        try {
            String requestPayload = ctx.getRequest().getReader().lines().collect(Collectors.joining(System.lineSeparator()));

            String response = loadBalancedRestTemplate.postForObject("http://ingestWithOutEureka/ingest", requestPayload, String.class);

            ctx.setResponseStatusCode(200);
            ctx.setResponseBody(response);
        } catch (IOException e) {
            ctx.setResponseStatusCode(500);
            ctx.setResponseBody("{ \"error\" : " + e.getMessage() + " }");
            System.out.println("Exception during feign call - " + e.getMessage());
            e.printStackTrace();
        } finally {
            ctx.setSendZuulResponse(false);
            ctx.getResponse().setContentType("application/json");
        }

        return null;
    }
}

Solution

  • So, here are the solutions that worked for me -

    Issue 1 - Retry was not working in spite of configuring ribbon.<client>.OkToRetryOnAllOperations: true. Ribbon was clearly ignoring my configuration.

    Solution: - It's strange, but after some debugging I had noticed that Ribbon was picking up client level configuration only if a global configuration was present in the first place.

    Once I set the global "OkToRetryOnAllOperations" as either "true" or "false" as shown below, ribbon started picking up the ribbon.<client>.OkToRetryOnAllOperations as expected and I could see the retries happening.

    ribbon:
      OkToRetryOnAllOperations: false
    

    Issue 2 - Also, even after setting the read time out and connect timeout configurations, I don't see ribbon respecting the configuration and still takes 2 mins to throw the error from the server

    Solution 2 - Though ribbon started retrying requests after the changes suggested in solution 1 above, I did not see ribbon honoring <client>.ribbon.ReadTimeout and <client>.ribbon.ConnectTimeout.

    After spending some time, I figure that this is because of using RestTemplate.

    While spring documentation mentions that you could use load balanced RestTemplate for achieving retries, it does not mention that the timeouts wont work with it. Based on this SO answer from 2014, it looks like while ribbon has been added as a interceptor when using load balanced RestTemplate to achieve serviceId to URI resolution, ribbon does not use the underlying HTTP client and uses the http client provided by the RestTemplate. Thus, the ribbon specific <client>.ribbon.ReadTimeout and <client>.ribbon.ConnectTimeout are NOT honored. After I added timeouts to RestTemplate, requests have started timing out at expected intervals.

    Lastly, Issue 3 - I enabled logs by passing a custom http client to rest template.

    @LoadBalanced
    @Bean
    RestTemplate loadBalanced() {
        RestTemplate restTemplate = new RestTemplate(new HttpComponentsClientHttpRequestFactory());
        System.out.println("returning load balanced rest client");
        ((HttpComponentsClientHttpRequestFactory)restTemplate.getRequestFactory()).setReadTimeout(1000*30);
        ((HttpComponentsClientHttpRequestFactory)restTemplate.getRequestFactory()).setConnectTimeout(1000*3);
        ((HttpComponentsClientHttpRequestFactory)restTemplate.getRequestFactory()).setConnectionRequestTimeout(1000*3);
        return restTemplate;
    }
    
    @Bean
    LoadBalancedBackOffPolicyFactory backOffPolicyFactory() {
        return new LoadBalancedBackOffPolicyFactory() {
            @Override
            public BackOffPolicy createBackOffPolicy(String service) {
                return new ExponentialBackOffPolicy();
            }
        };
    }
    

    With all the changes I see that the request retries are happening and with the timeouts and with exponential backoff and with request / responses logs visible as expected. Good luck!