Search code examples
apache-httpclient-5.x

Apache HTTPClient5 - How to Prevent Connection/Stream Refused


Problem Statement

Context

  • I'm a Software Engineer in Test running order permutations of Restaurant Menu Items to confirm that they succeed order placement w/ the POS
    • In short, this POSTs a JSON payload to an endpoint which then validates the order w/ a POS to define success/fail/other
    • Where POS, and therefore Transactions per Second (TPS), may vary, but each Back End uses the same core handling
    • This can be as high as ~22,000 permutations per item, in easily manageable JSON size, that need to be handled as quickly as possible
    • The Network can vary wildly depending upon the Restaurant, and/or Region, one is testing
      • E.g. where some have a much higher latency than others
    • Therefore, the HTTPClient should be able to intelligently negotiate the same content & endpoint regardless of this

Direct Problem

  • I'm using Apache's HTTP Client 5 w/ PoolingAsyncClientConnectionManager to execute both the GET for the Menu contents, and the POST to check if the order succeeds
  • This works out of the box, but sometimes loses connections w/ Stream Refused, specifically:
    • org.apache.hc.core5.http2.H2StreamResetException: Stream refused
  • No individual tuning seems to work across all network contexts w/ variable latency, that I can find
  • Following the stacktrace seems to indicate it is that the stream had closed already, therefore needs a way to keep it open or not execute an already-closed connection
if (connState == ConnectionHandshake.GRACEFUL_SHUTDOWN) {
    throw new H2StreamResetException(H2Error.PROTOCOL_ERROR, "Stream refused");
}

Some Attempts to Fix Problem

  • Tried to use Search Engines to find answers but there are few hits for HTTPClient5
  • Tried to use official documentation but this is sparse
  • Changing max connections per route to a reduced number, shifting inactivity validations, or connection time to live
    • Where the inactivity checks may fix the POST, but stall the GET for some transactions
    • And that tuning for one region/restaurant may work for 1 then break for another, w/ only the Network as variable
PoolingAsyncClientConnectionManagerBuilder builder = PoolingAsyncClientConnectionManagerBuilder
        .create()
        .setTlsStrategy(getTlsStrategy())
        .setMaxConnPerRoute(12)
        .setMaxConnTotal(12)
        .setValidateAfterInactivity(TimeValue.ofMilliseconds(1000))
        .setConnectionTimeToLive(TimeValue.ofMinutes(2))
        .build();
  • Shifting to a custom RequestConfig w/ different timeouts
private HttpClientContext getHttpClientContext() {
    RequestConfig requestConfig = RequestConfig.custom()
            .setConnectTimeout(Timeout.of(10, TimeUnit.SECONDS))
            .setResponseTimeout(Timeout.of(10, TimeUnit.SECONDS))
            .build();

    HttpClientContext httpContext = HttpClientContext.create();
    httpContext.setRequestConfig(requestConfig);
    return httpContext;
}

Initial Code Segments for Analysis

(In addition to the above segments w/ change attempts)

  • Wrapper handling to init and get response
public SimpleHttpResponse getFullResponse(String url, PoolingAsyncClientConnectionManager manager, SimpleHttpRequest req) {
            try (CloseableHttpAsyncClient httpclient = getHTTPClientInstance(manager)) {
                httpclient.start();

                CountDownLatch latch = new CountDownLatch(1);
                long startTime = System.currentTimeMillis();
                Future<SimpleHttpResponse> future = getHTTPResponse(url, httpclient, latch, startTime, req);

                latch.await();
                return future.get();
            } catch (IOException | InterruptedException | ExecutionException e) {
                e.printStackTrace();
                return new SimpleHttpResponse(999, CommonUtils.getExceptionAsMap(e).toString());
            }
        }
  • With actual handler and probing code
private Future<SimpleHttpResponse> getHTTPResponse(String url, CloseableHttpAsyncClient httpclient, CountDownLatch latch, long startTime, SimpleHttpRequest req) {
            return httpclient.execute(req, getHttpContext(), new FutureCallback<SimpleHttpResponse>() {

                @Override
                public void completed(SimpleHttpResponse response) {
                    latch.countDown();
                    logger.info("[{}][{}ms] - {}", response.getCode(), getTotalTime(startTime), url);
                }

                @Override
                public void failed(Exception e) {
                    latch.countDown();
                    logger.error("[{}ms] - {} - {}", getTotalTime(startTime), url, e);
                }

                @Override
                public void cancelled() {
                    latch.countDown();
                    logger.error("[{}ms] - request cancelled for {}", getTotalTime(startTime), url);
                }

            });
        }

Direct Question

  • Is there a way to configure the client such that it can handle for these variances on its own without explicitly modifying the configuration for each endpoint context?

Solution

  • Fixed w/ Combination of the below to Assure Connection Live/Ready

    (Or at least is stable)

    Forcing HTTP 1

    HttpAsyncClients.custom()
        .setConnectionManager(manager)
        .setRetryStrategy(getRetryStrategy())
        .setVersionPolicy(HttpVersionPolicy.FORCE_HTTP_1)
        .setConnectionManagerShared(true);
    

    Setting Effective Headers for POST

    • Specifically the close header
      • req.setHeader("Connection", "close, TE");
      • Note: Inactivity check helps, but still sometimes gets refusals w/o this

    Setting Inactivity Checks by Type

    • Set POSTs to validate immediately after inactivity
      • Note: Using 1000 for both caused a high drop rate for some systems
    PoolingAsyncClientConnectionManagerBuilder
        .create()
        .setValidateAfterInactivity(TimeValue.ofMilliseconds(0))
    
    • Set GET to validate after 1s
    PoolingAsyncClientConnectionManagerBuilder
        .create()
        .setValidateAfterInactivity(TimeValue.ofMilliseconds(1000))
    

    Given the Error Context

    • Tracing the connection problem in stacktrace to AbstractH2StreamMultiplexer
    • Shows ConnectionHandshake.GRACEFUL_SHUTDOWN as triggering the stream refusal
     if (connState == ConnectionHandshake.GRACEFUL_SHUTDOWN) {
        throw new H2StreamResetException(H2Error.PROTOCOL_ERROR, "Stream refused");
    }
    
    • Which corresponds to
    connState = streamMap.isEmpty() ? ConnectionHandshake.SHUTDOWN : ConnectionHandshake.GRACEFUL_SHUTDOWN;
    

    Reasoning

    • If I'm understanding correctly:
      • The connections were being un/intentionally closed
        • However, they were not being confirmed ready before executing again
        • Which caused it to fail because the stream was not viable
      • Therefore the fix works because (it seems)
        • Given Forcing HTTP1 allows for a single context to manage
          • Where HttpVersionPolicy NEGOTIATE/FORCE_HTTP_2 had greater or equivalent failures across the spectrum of regions/menus
        • And it assures that all connections are valid before use
        • And POSTs are always closed due to the close header, which is unavailable to HTTP2
        • Therefore
          • GET is checked for validity w/ reasonable periodicity
          • POST is checked every time, and since it is forcibly closed, it is re-acquired before execution
          • Which leaves no room for unexpected closures
            • And otherwise the potential that it was incorrectly switching to HTTP2

    Will accept this until a better answer comes along, as this is stable but sub-optimal.