Hi I am working with akka streams
along with akka-stream-kafka
. I am setting up a Stream with the below setup:
Source (Kafka) --> | Akka Actor Flow | --> Sink (MongoDB)
Actor Flow
basically by Actors that will process data, below is the hierarchy:
System
|
Master Actor
/ \
URLTypeHandler SerializedTypeHandler
/ \ |
Type1Handler Type2Handler SomeOtherHandler
So Kafka has the message, I write up the consumer and run it in atMostOnceSource
configuration and use
Consumer.Control control =
Consumer.atMostOnceSource(consumerSettings, Subscriptions.topics(TOPIC))
.mapAsyncUnordered(10, record -> processAccessLog(rootHandler, record.value()))
.to(Sink.foreach(it -> System.out.println("FinalReturnedString--> " + it)))
.run(materializer);
I've used a print as a sink initially, just to get the flow running.
and the processAccessLog
is defined as:
private static CompletionStage<String> processAccessLog(ActorRef handler, byte[] value) {
handler.tell(value, ActorRef.noSender());
return CompletableFuture.completedFuture("");
}
Now, from the definition ask
must be used when an actor is expecting a response, makes sense in this case since I want to return values to be written in the sink.
But everyone (including docs), mention to avoid ask
and rather use tell
and forward
, an amazing blog is written on it Don't Ask, Tell.
In the blog he mentions, in case of nested actors, use tell
for the first message and then use forward
for the message to reach the destination and then after processing directly send the message back to the root actor.
Now here is the problem,
ask
is Still the Right Pattern
From the linked blog article, one "drawback" of ask
is:
blocking an actor itself, which cannot pick any new messages until the response arrives and processing finishes.
However, in akka-stream
this is the exact feature we are looking for, a.k.a. "back-pressure". If the Flow
or Sink
are taking a long time to process data then we want the Source
to slow down.
As a side note, I think the claim in the blog post that the additional listener Actor
results in an implementation that is "dozens times heavier" is an exaggeration. Obviously an intermediate Actor adds some latency overhead but not 12x
more.
Elimination of Back-Pressure
Any implementation of what you are looking for would effectively eliminate back-pressure. An intermediate Flow that only used tell
would continuously propagate demand back to the Source regardless of whether or not your processing logic, within the handler Actors, was completing its calculations at the same speed that the Source is generating data.
Consider an extreme example: what if your Source could produce 1 million messages per second but the Actor receiving those messages via tell
could only process 1 message per second. What would happen to that Actor's mailbox?
By using the ask pattern in an intermediate Flow you are purposefully linking the speed of the handlers and the speed with which your Source produces data.
If you are willing to remove back-pressure signaling, from the Sink to the Source, then you might as well not use akka-stream in the first place. You can have either back-pressure or non-blocking messaging, but not both.