Search code examples
apache-sparknetworkingcluster-computingapache-spark-standalone

Can I distribute work with Apache Spark Standalone version?


I hear people talking about an "Apache Standalone Cluster", which confuses me because I understand a "cluster" as various machines connected by a potentially fast network and working in parallel, and "standalone" as a machine or program that is isolated. So the question is, can Apache Standalone do distributed work across a network? If it can, what is the difference then versus the non-standalone versions?


Solution

  • Standalone (don't mistake with local) in Spark means that you don't use external resource manage (YARN, Mesos) but Spark's own resource management utilities. It can be distributed the same way as Spark on other cluster managers.

    Spark in local mode runs on a single JVM. It cannot be distributed (but, in the limits of a single machine is still parallelized with threads and processes) is useful only for development.