Search code examples
springspring-cloudspring-cloud-netflixspring-cloud-feign

NullPointerException in LoadBalancerFeignClient (spring-cloud-netflix)


We are using Feign for our clients in our services. Recently one of the services started to randomly throw some exceptions which is caused by:

Caused by: java.lang.NullPointerException: null
    at org.springframework.cloud.netflix.feign.ribbon.LoadBalancerFeignClient.execute(LoadBalancerFeignClient.java:63)
    at org.springframework.cloud.sleuth.instrument.web.client.feign.TraceLoadBalancerFeignClient.execute(TraceLoadBalancerFeignClient.java:41)
    at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:97)
    at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76)
    at feign.hystrix.HystrixInvocationHandler$1.run(HystrixInvocationHandler.java:108)
    at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:301)
    at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:297)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.Observable.unsafeSubscribe(Observable.java:10211)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
    at rx.Observable.unsafeSubscribe(Observable.java:10211)
    at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41)
    at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.Observable.unsafeSubscribe(Observable.java:10211)
    at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94)
    at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56)
    at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47)
    at our.code.hystrix.AuthContextAwareHystrixConcurrencyStrategy$AuthorizationContextAwareCallable.call(AuthContextAwareHystrixConcurrencyStrategy.java:57)
    at org.springframework.cloud.sleuth.instrument.hystrix.SleuthHystrixConcurrencyStrategy$HystrixTraceCallable.call(SleuthHystrixConcurrencyStrategy.java:154)
    at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69)
    at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)

I took a look a spring-cloud-netflix-core (v1.2.2.RELEASE) code and its dependencies, but cannot figure out why the NPE is happening. In the stack trace it's pointing to Line 63 in LoadBalancerFeignClient which is:

private CachingSpringLoadBalancerFactory lbClientFactory;

@Override
public Response execute(Request request, Request.Options options) throws IOException {
  try {
    URI asUri = URI.create(request.url());
    String clientName = asUri.getHost();
    URI uriWithoutHost = cleanUrl(request.url(), clientName);
    FeignLoadBalancer.RibbonRequest ribbonRequest = new FeignLoadBalancer.RibbonRequest(
        this.delegate, request, uriWithoutHost);

    IClientConfig requestConfig = getClientConfig(options, clientName);
    return lbClient(clientName).executeWithLoadBalancer(ribbonRequest, // Line 63
        requestConfig).toResponse();
  }
  catch (ClientException e) {
    IOException io = findIOException(e);
    if (io != null) {
      throw io;
    }
    throw new RuntimeException(e);
  }
}

private FeignLoadBalancer lbClient(String clientName) {
  return this.lbClientFactory.create(clientName);
}

which means only lbClient(clientName) is the one possible place returning null. Looking at CachingSpringLoadBalancerFactory class and its implementation, I found this in the documentation of ConcurrentReferenceHashMap:

NOTE: The use of references means that there is no guarantee that items placed into the map will be subsequently available. The garbage collector may discard references at any time, so it may appear that an unknown thread is silently removing entries.

Now my question is why it's happening and how to solve it. Thanks.


Solution

  • For the record, this the issue on GitHub: https://github.com/spring-cloud/spring-cloud-netflix/issues/2443 Though it is fixed in the new version.