I have 11 nodes, each one has 2G of memory and 16 cores, I tried to submit my spark application using this
./bin/spark-submit --class myapp.Main --master spark://Name:7077 --conf spark.shuffle.memoryFraction=0 --executor-memory 2G --deploy-mode client /home/mbala/fer/myjars7/etlpersist.jar /home/mfile80.csv
in Slaves file I didn't add the node's ip in which I launch this command because I think that in Client mode the driver must be running in this node.
but whenever I try to run it, I get out of memory exception (sometimes because of GC or because of the Heap), I tried many solutions suggested in spark website and also here in stackOverflow, I also tried to minimize my code, I used MemoryAndDiskStorage but even then I still have this problem
Ps: I use this line because I found it as solution in this forum
--conf spark.shuffle.memoryFraction=0
Should I minimize the number of cores? because I think that if I use 16 cores with only 2G memory it won't be enough for shuffle
Can you please try to use the small g
with --executor-memory
and --driver-memory
options in command that will help you.
When you set the executor memory to 2GB
. Then it only assigns 0.6%
of the original memory for storage and execution and it takes 0.5%
as storage memory from the 0.6%
of original memory. Hence only 0.5%
of memory remains available for execution.
You should understand this concept of memory management. It will help you in debugging the application.