Search code examples
javaconnectionmicroservicesjbpmfeign

My FeignClient keeps the connection open for the server response BUT the server waits my client to close connection before returning anything


I am developing a java microservice which has internal calls from and to a jBPM server. In my microservice, I have implemented a @FeignClient (org.springframework.cloud.openfeign.FeignClient) to make calls to jBPM server, in particular the first call has to start the jBPM process.

My "startProcess" API:

@FeignClient(name = "${resource.jbpmService.name}",
    url = "${resource.jbpmService.url}",
    path = "${resource.jbpmService.path}",
    configuration = com.my.MsClientErrorDecoder.class)
public interface JbpmServiceResource {

    @PostMapping(path = "/server/containers/{containerID}/processes/{processID}/instances",
        consumes = "application/json",
        produces = "application/json")
    public Long startProcess(
        @RequestHeader("Authorization") String authHeader,
        @PathVariable(name = "containerID") String containerID,
        @PathVariable(name = "processID") String processID,
        @RequestBody(required = true) StartProcessBody idRequest);
}

When the "startProcess" call starts, my client successfully contacts the jBPM server. At this point jBPM, before completing the above request, has to call an API of my microservice to retrieve basic information it needs. However, it seems that the @FeignClient does not accept any incoming requests until the "startProcess" is done and the jBPM server waits in the meanwhile. After waiting for the read timeout of my "startProcess", my @FeignClient goes in timeout and throws the exceptions:

feign.RetryableException: Read timed out executing POST http://xx.yy.zz.ww:8080/kie-server/services/rest/server/containers/jBPMContainerName/processes/jBPMProcessName/instances at feign.FeignException.errorExecuting(FeignException.java:84) ~[feign-core-10.1.0.jar!/:?] at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:113) ~[feign-core-10.1.0.jar!/:?] at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:78) ~[feign-core-10.1.0.jar!/:?] at feign.ReflectiveFeign$FeignInvocationHandler.invoke(ReflectiveFeign.java:103) ~[feign-core-10.1.0.jar!/:?] at com.sun.proxy.$Proxy215.startProcess(Unknown Source) ~[?:?]

[JbpmServiceResource#startProcess] <--- ERROR SocketTimeoutException: Read timed out (10005ms)

The "startProcess" thus returns a HTTP 500 error and closes the call. Only at this point the jBPM server notices that the connection is free and calls my microservice for the API "getRequestInfo" to retrieve basic information. Finally, the jBPM server returns the processInstanceID expected as output of the "startProcess".

My "getRequestInfo" API:

@Service
@RestController
@Validated
@RequestMapping(BASEURL)
public class UawJbpmService {

    @GetMapping("/get-request-info/{idRequest}")
    public UawRequestInfoJBPM getRequestInfo(@PathVariable @NotNull int idRequest) {

        return uawJbpmComponent.getRequestInfo(idRequest);
    }
}

This call goes ok (HTTP 200) in few milliseconds. There are no reachability problems between my microservice and jBPM for each individual call. The problem is that my microservice seems not to be accepting multiple connections (in particular to/from jBPM). Another thing is that the initial call to jBPM uses the internal jBPM logic and we cannot modify it.

How may I make this work?

Things I have tried already

  1. I have tried changing the output type from the "startProcess" to String and then to void but neither of those have worked.
  2. I have tried increasing both the feign connect-timeout and read-timeout but didn't work as well.
  3. I have tried creating a second microservice as middleware between my initial microservice and jBPM. The idea was to call the middleware, it would have then called jBPM to start its processes. In this case, my initial microservice keeps its connection to the middleware open and jBPM waits until the connection is closed. Only after the timeout, jBPM calls for the "getRequestInfo" but my microservice has already returned a HTTP 500. The situation is still the same as before.

Solution

  • Together with my team we solved the issue.

    Our microservice is deployed on Kubernetes and we had 1 single POD for it. Turns out the POD got overloaded and could not handle more requests.

    The solution was to create a replica of our POD on kubernetes.