Why is the parallel execution of an Apache Flink application slower than the sequential execution?

I have an Apache Flink setup with one TaskManager and two processing slots. When I execute an application with parallelism set as 1, the job takes around 33 seconds to execute. When I increase the parallelism to 2, the job takes 45 seconds to complete.

I am using Flink on my Windows machine with the configuration of 10 Compute Cores(4C + 6G). I want to achieve better results with 2 slots. What can I do?

Solution

Distributed systems like Apache Flink are designed to run in data centers on hundreds of machines. They are not designed to parallelize computations on a single computer. Moreover, Flink targets large-scale problems. Jobs that run in seconds on a local machine are not the primary use case for Flink.

Parallelizing an application always causes overhead. Data has to be distributed and shared between processes and threads. Flink distributes data across TaskManager slots by serializing and deserializing it. Moreover, starting and coordinating distributed tasks also does not come for free.

It is not surprising to observe longer execution times when scaling a small-scale problem with a distributed system on a single machine. You could port the application to a thread-parallel application that leverages shared memory.