Spark JobServer, memory settings for release

I've set up a spark-jobserver to enable complex queries on a reduced dataset.

The jobserver executes two operations:

Sync with the main remote database, it makes a dump of some of the server's tables, reduce and aggregates the data, save the result as a parquet file and cache it as a sql table in memory. This operation will be done every day;
Queries, when the sync operation is finished, users can perform SQL complex queries on the aggregated dataset, (eventually) exporting the result as csv file. Every user can do only one query at time, and wait for its completion.

The biggest table (before and after the reduction, which include also some joins) has almost 30M of rows, with at least 30 fields.

Actually I'm working on a dev machine with 32GB of ram dedicated to the job server, and everything runs smoothly. Problem is that in the production one we have the same amount of ram shared with a PredictionIO server.

I'm asking how determine the memory configuration to avoid memory leaks or crashes for spark.

I'm new to this, so every reference or suggestion is accepted.

Thank you

Solution

Take an example, if you have a server with 32g ram. set the following parameters :

 spark.executor.memory = 32g

Take a note:

The likely first impulse would be to use --num-executors 6 --executor-cores 15 --executor-memory 63G. However, this is the wrong approach because:

63GB + the executor memory overhead won’t fit within the 63GB capacity of the NodeManagers. The application master will take up a core on one of the nodes, meaning that there won’t be room for a 15-core executor on that node. 15 cores per executor can lead to bad HDFS I/O throughput.

A better option would be to use --num-executors 17 --executor-cores 5 --executor-memory 19G. Why?

This config results in three executors on all nodes except for the one with the AM, which will have two executors. --executor-memory was derived as (63/3 executors per node) = 21. 21 * 0.07 = 1.47. 21 – 1.47 ~ 19.

This is explained here if you want to know more : http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/