I'm considering Apache Spark for data analysis. In the past I've experienced Java/Scala slowdowns on 4-socket servers due to NUMA architecture and objects being local to a single node. The solution was to start a separate pinned JVM for each NUMA node and have them talk to each other using Akka.
How will NUMA be handelled in Spark to avoid similar situations?
If you start Spark with --executor-cores=32
(assuming 8 virtual cores per socket) you will have the same issues. But you can start 4 workers per machine, each with --executor-cores=8
instead. Then you could pin these executors to the nodes.
This setup will incur more communication overhead, but will likely be a good trade-off. Spark tries to minimize communication between executors, since they on different machines in the typical case.