Search code examples
apache-zeppelin

Zeppelin configuration: is there a cache somewhere?


It appears I'm missing something in the way Zeppelin reads interpreter specific configuration.

For example I set spark.cores.max to 12 in zeppelin-env.sh and in the spark-defaults.sh in $SPARK_HOME/conf but starting the Spark interpreter was starting a Spark Application with only 4 cores.

Then I changed that property in the interpreter UI of Zeppelin and it worked.

  • Where are the properties, set using the UI (webpage) stored?
  • is that UI supposed to be 'in sync' with zeppelin-env.sh or zeppelin-site.xml?

Solution

  • There is a hierarchy here:

    • parameters in the UI (interpreter) takes precedence over what is specified in zeppelin-env.sh;
    • parameters configured in zeppelin-env.sh takes precedence over what is specified in spark-defaults.sh; and,
    • if nothing is specified using the above, configuration parameters fall-back to those specified in spark-defaults.sh.

    There is an important duality here, with respect to what one would expect with any spark application:

    • configuration parameters explicitly set within an application take precedence over those specified with spark-submit;
    • parameters specified with spark-submit take precedence over those specified in spark-defaults.sh; and,
    • if nothing is specified using the above, then configuration parameters fall-back to those specified in spark-defaults.sh.

    So what you are observing is to be expected, although I too find it confusing (and not particularly well documented anywhere).