i use flink on yarn in pre-job mode, and yarn cluster have 500 vcore and 2000G ram, and flink app have large state. i wonder to know how should i set the slot count. set large slot count and less TaskManager count, or less slot count and large TaskManager count?
exemple :
which one will have batter performance?
It depends. In part it depends on which state backend you are using, and on what "better performance" means for your application. Whether you are running batch or streaming workloads also makes a difference, and the job's topology can also be a factor.
If you are using RocksDB as the state backend, then having fewer, larger task managers is probably the way to go. With state on the heap, larger task managers are more likely to disrupt processing with significant GC pauses, which argues for having more, smaller TMs. But this mostly impacts worst-case latency for streaming jobs, so if you are running batch jobs, or only care about streaming throughput, then this might not be worth considering.
Communication between slots in the same TM can be optimized, but this isn't a factor if your job doesn't do any inter-slot communication.