TMs on the same nodemanager which leads to high pressure on HDFS

we have a 100-node hadoop cluster. Currently I write a Flink App to write many files on HDFS by BucktingSink. When I run Flink App on yarn I found that all task managers is distributed on the same nodemanager which means all subtasks is running on this node. It opens many file descriptors on the datanode of this busy node. (I think flink filesystem connector connect to local datanode in precedence) This leads to high pressure on that node which easily fails the job.

Any good idea to solve this problem? Thank you very much!

Solution

This sounds like a Yarn scheduling problem. Please take a look at Yarn's capacity scheduler which allows you to schedule containers on nodes based on the available capacity. Moreover you could tell Yarn to also consider virtual cores for scheduling. This allows to define a different resource dimension compared to memory only.

Unable to run hadoop application due to NoClassDefFoundError
Hadoop on Windows - "Error JAVA_HOME is incorrectly set."
Ports are not available: listen tcp 0.0.0.0/50070: bind: An attempt was made to access a socket in a way forbidden by its access permissions
Confusion between Operational and Analytical Big Data and on which category Hadoop operates?
Any command to get active namenode for nameservice in hadoop?
Datanode process not running in Hadoop
Python read file as stream from HDFS
What is the purpose of "uber mode" in hadoop?
Change block size of dfs file
Map Reduce Job Failing with OOM [org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster]
Unable to access Hadoop CLI after enabling Kerberos
How to check if Hadoop daemons are running?
hive -e with delimiter
Does mapreduce program consumes all the files (input dataset) in a folder by default?
Upgrading hadoop to 3.1.2 with hbase-testing-utility 2.2.3
How to understand the result of yarn queue status
Spark: what options can be passed with DataFrame.saveAsTable or DataFrameWriter.options?
Ambari 2.0 installation fails, "<urlopen error [Errno 111] Connection refused>"
Getting java.lang.UnsatisfiedLinkError when trying to run my Code
Hadoop HDFS - Difference between Missing replica and Under replicated blocks
Datanode having trouble with JVM pausing
Apache Crunch Job On AWS EMR using Oozie
How to turn off INFO logging in Spark?
run hadoop ERROR: JAVA_HOME /usr/bin/java does not exist
Hadoop start-all.cmd command : datanode shutting down
MacOS Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Hadoop - namenode is not starting up
how t restore a hdfs deleted file
Sqoop Import HBase - SQL Database
Spark Streaming - Refresh Static Data