I had a question about memory config. I am running a 3 node cluster with Spark, Cassandra, Hadoop, Thrift and Yarn. I want to store my files in hdfs, so i loaded hadoop. I am finding that i am running out of memory when running my queries. I was able to figure out how to restrict cassandra to run in less than 4gb. Is there such a setting for hadoop? How about Yarn? As i only use hadoop to load up my flat files, i think setting it to 1 or 2gb should be fine. My boxes have 32gb of ram and 16 cores each.
It is hard to say without the error message you are facing. But if you want to check about allocation of memory at your workers you can setup these two configurations at your yarn-site.xml
:
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value> <!-- 40 GB -->
<name>yarn.scheduler.maximum-allocation-mb</name> <!-Max RAM-per-container->
<value>8192</value>
You can see more details here in this question.