Search code examples
spring-bootspring-webfluxproject-reactorrsocketspring-rsocket

repeatWhenEmpty operator doesn't work in the context of an RSocket websocket request but it does in HTTP calls


I've come across a strange behaviour with the repeatWhenEmpty operator that doesn't make any sense to me. I uploaded a repository on Github with a minimum reproducible sample: https://github.com/codependent/rsocket-rating-service.

Consider this controller which offers two endpoints (HTTP @GetMapping("/requestRating") - WebSocket @MessageMapping("request-rating"). Notice this call .repeatWhenEmpty(Repeat.onlyIf<Any> { true }.backoff(Backoff.fixed(Duration.ofSeconds(1)))) in the middle of the generateRating() chain:

@RestController
class RatingServiceRestController {

    private val FAIL_RATE = 0
    private val logger = LoggerFactory.getLogger(javaClass)

    @GetMapping("/requestRating")
    fun getRatingHttp(ratingRequest: RatingRequest): Mono<Rating> {
        return generateRating(ratingRequest)
    }

    @MessageMapping("request-rating")
    fun getRatingWebSocket(ratingRequest: RatingRequest): Mono<Rating> {
        return generateRating(ratingRequest)
    }

    private fun generateRating(ratingRequest: RatingRequest): Mono<Rating> {
        return doGenerateRating(ratingRequest)
                .doOnNext {
                    logger.info("Next1 {}", it)
                }
                .doOnCancel {
                    logger.info("Cancel1")
                }
                .doOnSuccess {
                    logger.info("Success1 {}", it)
                }
                .doOnError { throwable ->
                    logger.error("Error1 {}", throwable)
                }
                .doOnTerminate {
                    logger.info("Terminate1")
                }
                .repeatWhenEmpty(Repeat.onlyIf<Any> { true }.backoff(Backoff.fixed(Duration.ofSeconds(1))))
                .doOnNext {
                    logger.info("Next2 {}", it)
                }
                .doOnCancel {
                    logger.info("Cancel2")
                }
                .doOnSuccess {
                    logger.info("Success2 {}", it)
                }
                .doOnError { throwable ->
                    logger.error("Error2 {}", throwable)
                }
                .doOnTerminate {
                    logger.info("Terminate2")
                }
    }

    private fun doGenerateRating(ratingRequest: RatingRequest): Mono<Rating> {
        return Mono.defer {
            val random = (0..100).random()
            if (random <= FAIL_RATE) {
                Mono.empty()
            } else {
                Mono.just(Rating(ratingRequest.songId, (0..10).random()))
            }
        }
    }
}

After starting the application I can invoke http://localhost:8080/requestRating?songId=1234 and it returns a result, showing these logs as expected:

RatingServiceRestController    : Next1 Rating(songId=1234, value=1)
RatingServiceRestController    : Success1 Rating(songId=1234, value=1)
RatingServiceRestController    : Terminate1
RatingServiceRestController    : Cancel1
RatingServiceRestController    : Next2 Rating(songId=1234, value=1)
RatingServiceRestController    : Success2 Rating(songId=1234, value=1)
RatingServiceRestController    : Terminate2

When I invoke the same logic from a Websocket:

  1. Access http://localhost:8080/index.html
  2. Fill any String and push send

Oddly enough these are all the logs I see:

RatingServiceRestController    : Next1 Rating(songId=asfdasf, value=2)
RatingServiceRestController    : Success1 Rating(songId=asfdasf, value=2)
RatingServiceRestController    : Terminate1

Around three minutes later this shows up:

RatingServiceRestController    : Cancel2
RatingServiceRestController    : Cancel1

Despite having generated an element as seen in Next1, no doOnXXX operators after repeatWhenEmpty are invoked. The client obviously doesn't get a result either.

What's going on here? How can I use repeatWhenEmpty in the context of an RSocket websocket?

UPDATE:

I've added a log() operator to get some more info.

HTTP Request:

10:37:01.957  INFO 5202 --- [ctor-http-nio-2] reactor.Mono.Defer.1                     : onSubscribe(MonoNext.NextSubscriber)
10:37:01.959  INFO 5202 --- [ctor-http-nio-2] reactor.Mono.Defer.1                     : request(unbounded)
10:37:01.967  INFO 5202 --- [ctor-http-nio-2] c.c.r.r.c.RatingServiceRestController    : Next1 Rating(songId=1234, value=0)
10:37:01.968  INFO 5202 --- [ctor-http-nio-2] c.c.r.r.c.RatingServiceRestController    : Success1 Rating(songId=1234, value=0)
10:37:01.968  INFO 5202 --- [ctor-http-nio-2] c.c.r.r.c.RatingServiceRestController    : Terminate1
10:37:01.968  INFO 5202 --- [ctor-http-nio-2] c.c.r.r.c.RatingServiceRestController    : Cancel1
10:37:01.968  INFO 5202 --- [ctor-http-nio-2] reactor.Mono.Defer.1                     : onNext(Rating(songId=1234, value=0))
10:37:01.968  INFO 5202 --- [ctor-http-nio-2] c.c.r.r.c.RatingServiceRestController    : Next2 Rating(songId=1234, value=0)
10:37:01.968  INFO 5202 --- [ctor-http-nio-2] c.c.r.r.c.RatingServiceRestController    : Success2 Rating(songId=1234, value=0)
10:37:01.968  INFO 5202 --- [ctor-http-nio-2] c.c.r.r.c.RatingServiceRestController    : Terminate2
10:37:01.976  INFO 5202 --- [ctor-http-nio-2] reactor.Mono.Defer.1                     : onComplete()

RSocket Request:

10:37:29.143  INFO 5202 --- [ctor-http-nio-6] reactor.Mono.Defer.2                     : onContextUpdate(Context1{reactor.onDiscard.local=reactor.core.publisher.Operators$$Lambda$720/0x0000000800597c40@6dfdbeee})
10:37:29.143  INFO 5202 --- [ctor-http-nio-6] reactor.Mono.Defer.2                     : onSubscribe(MonoNext.NextSubscriber)
10:37:29.143  INFO 5202 --- [ctor-http-nio-6] reactor.Mono.Defer.2                     : request(1)
10:37:29.143  INFO 5202 --- [ctor-http-nio-6] c.c.r.r.c.RatingServiceRestController    : Next1 Rating(songId=asdf, value=0)
10:37:29.143  INFO 5202 --- [ctor-http-nio-6] c.c.r.r.c.RatingServiceRestController    : Success1 Rating(songId=asdf, value=0)
10:37:29.143  INFO 5202 --- [ctor-http-nio-6] c.c.r.r.c.RatingServiceRestController    : Terminate1

Three minutes later:

10:40:27.802  INFO 5202 --- [     parallel-1] c.c.r.r.c.RatingServiceRestController    : Cancel2
10:40:27.802  INFO 5202 --- [     parallel-1] reactor.Mono.Defer.2                     : cancel()
10:40:27.802  INFO 5202 --- [     parallel-1] c.c.r.r.c.RatingServiceRestController    : Cancel1

As you can see there are some differences

  1. In the RSocket request there's an onContextUpdate invocation.
  2. RSocket requests 1 element, HTTP unbounded
  3. RSocket execution seems to be retrying or doing something despite the emission of an element (onNext). The CPU is stuck at 6% doing some work, which didn't happen in HTTP calls as you can see in the following image:

enter image description here

UPDATE 2:

I've been debugging and found this difference in the execution during the first cancelation, specifically here in class Operators:

    public static void onDiscardMultiple(@Nullable Iterator<?> multiple, boolean knownToBeFinite, Context context) {
        if (multiple == null) return;
        if (!knownToBeFinite) return;

        Consumer<Object> hook = context.getOrDefault(Hooks.KEY_ON_DISCARD, null);
        if (hook != null) {
            try {
                multiple.forEachRemaining(o -> {
                    if (o != null) {
                        try {
                            hook.accept(o);
                        }
                        catch (Throwable t) {
                            log.warn("Error while discarding element from an Iterator, continuing with next element", t);
                        }
                    }
                });
            }
            catch (Throwable t) {
                log.warn("Error while discarding Iterator, stopping", t);
            }
        }
    }

HTTP:

Context:

enter image description here

Consumer<Object> hook = context.getOrDefault(Hooks.KEY_ON_DISCARD, null);

hook is null so it doesn't execute:

if (hook != null) {
            try {
                multiple.forEachRemaining(o -> {...

RSocket:

Context:

enter image description here

In this case hook is:

enter image description here

And it loops infinetely in the forEachRemaining block:

    if (hook != null) {
            try {
                multiple.forEachRemaining(o -> {
                    if (o != null) {
                        try {
                            hook.accept(o);
                        }
                        catch (Throwable t) {
                            log.warn("Error while discarding element from an Iterator, continuing with next element", t);
                        }
                    }
                });
            }

Solution

  • As seen in https://github.com/rsocket/rsocket-java/issues/860 there seems to be an issue in Project Reactor.

    I changed to the following operator as advised, which solved the problem:

    .repeatWhenEmpty(30) { longFlux ->
          longFlux.delayElements(Duration.ofSeconds(1))
                 .doOnNext { logger.info("Repeating {}", it) }
    }