I am configuring a Flink Job that can handle almost 1 million data per second, I have started with below configuration CPU: 4 cores Memory: 2GB Task Slots: 4
with only 30k logs per second But my job still goes too much busy and have a much backpressure, As far as I read that Flink can handle very large amount of data But here is some contradict, I might miss out some of the configuration, So can anybody help me to figure out it would be highly appreciate
Thank you in advance
I have tried by increasing a memory and parellelism but it didn't work for me, I want to understand that is it expected like with this configuration this result is okay or I should configure the job in any other way.
For a workflow reading from Kafka, doing a broadcast-stream based enrichment, and writing to Hudi, I got a rate of about 13K records/sec/core. This is with optimizations like using faster-serde for deserializing records from Kafka, etc.
So with 4 cores, 30K records/second is in the right ballpark.
Note that increasing parallelism without increasing the number of cores available won't help, and typically hurts your throughput.