Search code examples
javascalaapache-sparknumanumactl

Spark on NUMA systems


I'm considering Apache Spark for data analysis. In the past I've experienced Java/Scala slowdowns on 4-socket servers due to NUMA architecture and objects being local to a single node. The solution was to start a separate pinned JVM for each NUMA node and have them talk to each other using Akka.

How will NUMA be handelled in Spark to avoid similar situations?


Solution

  • If you start Spark with --executor-cores=32 (assuming 8 virtual cores per socket) you will have the same issues. But you can start 4 workers per machine, each with --executor-cores=8 instead. Then you could pin these executors to the nodes.

    This setup will incur more communication overhead, but will likely be a good trade-off. Spark tries to minimize communication between executors, since they on different machines in the typical case.