How PyFlink performance is compared to Flink + Scala?
Big Picture. The goal is to build Lambda architecture with Cold and Hot Tier. Cold (Batch) Tier will be implemented with Apache Spark (PySpark). But with Hot (Streaming) Tier there are different options: Spark Streaming or Flink.
Thus Apache Flink is pure streaming rather then Spark's micro-batches, I tend to choose Apache Flink. But my only point of concern is performance of PyFlink. Will it have less latency that PySpark streaming? Is it slower then Scala written Flink code? In what cases it's slower?
Thank you in advance!
I had implemented something very similar , and from my experience these are a few things
If you stick your service to the native functions given in PyFlink you will not observe any noticeable difference in performance .