Search code examples
asynchronousapache-flinkflink-streaming

Async Operation in A Flink Sink using CompletableFuture


Background

Planning to set a up data pipeline using Flink.

The flow looks like this

        Kafka --> Flink Job --> gRPC endpoint

Story so far

  1. Blocking implementation is up and running. But that will not scale for high QPS
  2. Tried simulating async behavior here

Problem

  • For Async Behavior not sure how the behavior would be
  • if CompletableFuture is used, per message it will be processed in Async manner, but will the next message be fetched for processing before processing of first is complete ? In other words, there is a way to achieve async processing within a task manager. But what is the behavior of Task manager in fetching next message / tuple ? Will is wait till Async process is complete or will it submit to CompletableFuture / Thread and fetch next message ? Not clear about that
  • Will using a custom threadpool cause any issues if not shutdown as the pipeline will be running over a long period ?
  • Any other solution to achieve async behavior in Flink sink ?

Solution

  • I would leverage Flink's support for async operators, and have a DiscardingSink, versus trying to implement a custom async sink.

    And no, I don't see any reason why having a persistent thread pool would cause problems.