Search code examples
apache-sparkapache-spark-standalone

Spark resource scheduling - Standalone cluster manager


I have pretty low configuration testing machine for my data pipelines developed in Spark. I will use only one AWS t2.large instance, which has only 2 CPUs and 8 GB of RAM.

I need to run 2 spark streaming jobs, as well as leave some memory and CPU power for occasionally testing batch jobs.

So I have master and one worker, which are on the same machine.

I have some general questions: 1) How many executors can run per one worker? I know that default is one, but does it make sense to change this?

2) Can one executor execute multiple applications, or one executor is dedicated only to one application?

3) Is a way to make this work, to set memory that application can use in configuration file, or when I create spark context?

Thank you


Solution

  • How many executors can run per one worker? I know that default is one, but does it make sense to change this?

    It makes sense only in case you have enough resources. Say, on a machine with 24 GB and 12 cores it's possible to run 3 executors if you're sure that 8 GB is enough for one executor.

    Can one executor execute multiple applications, or one executor is dedicated only to one application?

    Nope, every application starts their own executors.

    Is a way to make this work, to set memory that application can use in configuration file, or when I create spark context?

    I'm not sure I understand the question, but there are 3 ways to provide configuration for applications

    • file spark-defaults.conf, but don't forget to turn on to read default properties, when you create new SparkConf instance.
    • providing system properties through -D, when you run the application or --conf if that's spark-submit or spark-shell. Although for memory options there are specific parameters like spark.executor.memory or spark.driver.memory and others to be used.
    • provides the same options through new SparkConf instance using its set methods.