EDIT: here https://github.com/wujek-srujek/reactor-retry-test is a repository with all the code.
I have the following Spring WebClient
code to POST to a remote server (Kotlin code without imports for brevity):
private val logger = KotlinLogging.logger {}
@Component
class Client(private val webClient: WebClient) {
companion object {
const val maxRetries = 2L
val firstBackOff = Duration.ofSeconds(5L)
val maxBackOff = Duration.ofSeconds(20L)
}
fun send(uri: URI, data: Data): Mono<Void> {
return webClient
.post()
.uri(uri)
.contentType(MediaType.APPLICATION_JSON)
.bodyValue(data)
.retrieve()
.toBodilessEntity()
.doOnSubscribe {
logger.info { "Calling backend, uri: $uri" }
}
.retryExponentialBackoff(maxRetries, firstBackOff, maxBackOff, jitter = false) {
logger.debug { "Call to $uri failed, will retry (#${it.iteration()} of max $maxRetries)" }
}
.doOnError {
logger.error { "Call to $uri with $maxRetries retries failed with $it" }
}
.doOnSuccess {
logger.info { "Call to $uri succeeded" }
}
.then()
}
}
(It returns an empty Mono
as we don't expect an answer, nor do we care about it.)
I would like to test 2 cases, and one of them is giving me headaches, namely the one in which I want to test that all the retries have been fired. We are using MockWebServer
(https://github.com/square/okhttp/tree/master/mockwebserver) and the StepVerifier
from reactor-test. (The test for success is easy and doesn't need any virtual time scheduler magic, and works just fine.) Here is the code for the failing one:
@JsonTest
@ContextConfiguration(classes = [Client::class, ClientConfiguration::class])
class ClientITest @Autowired constructor(
private val client: Client
) {
lateinit var server: MockWebServer
@BeforeEach
fun `init mock server`() {
server = MockWebServer()
server.start()
}
@AfterEach
fun `shutdown server`() {
server.shutdown()
}
@Test
fun `server call is retried and eventually fails`() {
val data = Data()
val uri = server.url("/server").uri()
val responseStatus = HttpStatus.INTERNAL_SERVER_ERROR
repeat((0..Client.maxRetries).count()) {
server.enqueue(MockResponse().setResponseCode(responseStatus.value()))
}
StepVerifier.withVirtualTime { client.send(uri, data) }
.expectSubscription()
.thenAwait(Duration.ofSeconds(10)) // wait for the first retry
.expectNextCount(0)
.thenAwait(Duration.ofSeconds(20)) // wait for the second retry
.expectNextCount(0)
.expectErrorMatches {
val cause = it.cause
it is RetryExhaustedException &&
cause is WebClientResponseException &&
cause.statusCode == responseStatus
}
.verify()
// assertions
}
}
I am using withVirtualTime
because I don't want the test to take nearly seconds.
The problem is that the test blocks indefinitely. Here is the (simplified) log output:
okhttp3.mockwebserver.MockWebServer : MockWebServer[51058] starting to accept connections
Calling backend, uri: http://localhost:51058/server
MockWebServer[51058] received request: POST /server HTTP/1.1 and responded: HTTP/1.1 500 Server Error
Call to http://localhost:51058/server failed, will retry (#1 of max 2)
Calling backend, uri: http://localhost:51058/server
MockWebServer[51058] received request: POST /server HTTP/1.1 and responded: HTTP/1.1 500 Server Error
Call to http://localhost:51058/server failed, will retry (#2 of max 2)
As you can see, the first retry works, but the second one blocks. I don't know how to write the test so that it doesn't happen. To make matters worse, the client will actually use jitter, which will make the timing hard to anticipate.
The following test using StepVerifier
but without WebClient
works fine, even with more retries:
@Test
fun test() {
StepVerifier.withVirtualTime {
Mono
.error<RuntimeException>(RuntimeException())
.retryExponentialBackoff(5,
Duration.ofSeconds(5),
Duration.ofMinutes(2),
jitter = true) {
println("Retrying")
}
.then()
}
.expectSubscription()
.thenAwait(Duration.ofDays(1)) // doesn't matter
.expectNextCount(0)
.expectError()
.verify()
}
Could anybody help me fix the test, and ideally, explain what is wrong?
This is a limitation of virtual time and the way the clock is manipulated in StepVerifier
. The thenAwait
methods are not synchronized with the underlying scheduling (that happens for example as part of the retryBackoff
operation). This means that the operator submits retry tasks at a point where the clock has already been advanced by one day. So the second retry is scheduled for + 1 day and 10 seconds
, since the clock is at +1 day
. After that, the clock is never advanced so the additional request is never made to MockWebServer
.
Your case is made even more complicated in the sense that there is an additional component involved, the MockWebServer
, that still works "in real time".
Though advancing the virtual clock is a very quick operation, the response from the MockWebServer
still goes through a socket and thus has some amount of latency to the retry scheduling, which makes things more complicated from the test writing perspective.
One possible solution to explore would be to externalize the creation of the VirtualTimeScheduler
and tie advanceTimeBy
calls to the mockServer.takeRequest()
, in a parallel thread.