What happens if total parallel instances of operators are higher than parallelism of the flink system?
Here is the scenario:
taskmanager.numberOfTaskSlots=5
and parallelism.default=5
kafkaSource1.map(Mapper1).sink(sink1);
kafkaSource2.map(Mapper2).sink(sink1);
After deploying this dataflow with 5 parallelism, will TaskManager suffer from overload?
As far as my understanding, Tasks will be spreaded to the TaskManager's slots like this one:
The diagram is correct. If you disable operator chaining, then each slot will contain 5 tasks, as shown. Each task will have a Java thread, which will sit blocked on the network until there is input to process. All of these tasks will run independently, in parallel.
However, disabling operator chaining is a very bad idea. You will pay a large performance penalty for this, because it will cause serialization/deserialization to occur where it isn't needed. (Also, if the mappers are simply doing deserialization from Kafka, you will get better performance if you use an appropriate KafkaDeserializationSchema, and eliminate the mappers.)
Will the task managers be overloaded? Probably not, provided you make good choices about operator chaining, etc. I would only be worried if the mappers are doing something unusually expensive. But it depends, in part, on the throughput you need to achieve.