Search code examples
apache-spark-sqldatastaxdatastax-enterprise

Datastax 6 standalone analytics server


I downloaded datastax 6 and would like to spin up a single (on mac El Capitan) analytics (spark is fine, but spark + search would be good). I extracted the gz, configured the directory structures and executed dse cassandra -ks. Start up seems to work just fine, I can get to the spark master node, problem is when I run dse spark-sql (or just spark). I constantly get the following error: Is it possible to setup a single node for development?

ERROR [ExecutorRunner for app-20180623083819-0000/212] 2018-06-23 08:40:28,323 SPARK-WORKER Logging.scala:91 - Error running executor
java.lang.IllegalStateException: Cannot find any build directories.
    at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
    at org.apache.spark.launcher.AbstractCommandBuilder.getScalaVersion(AbstractCommandBuilder.java:240) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
    at org.apache.spark.launcher.AbstractCommandBuilder.buildClassPath(AbstractCommandBuilder.java:194) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
    at org.apache.spark.launcher.AbstractCommandBuilder.buildJavaCommand(AbstractCommandBuilder.java:117) ~[spark-launcher_2.11-2.2.0.14.jar:2.2.0.14]
    at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:39) ~[spark-core_2.11-2.2.0.14.jar:2.2.0.14]
    at org.apache.spark.launcher.WorkerCommandBuilder.buildCommand(WorkerCommandBuilder.scala:45) ~[spark-core_2.11-2.2.0.14.jar:2.2.0.14]
    at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:63) ~[spark-core_2.11-2.2.0.14.jar:6.0.0]
    at org.apache.spark.deploy.worker.CommandUtils$.buildProcessBuilder(CommandUtils.scala:51) ~[spark-core_2.11-2.2.0.14.jar:6.0.0]
    at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:150) ~[spark-core_2.11-2.2.0.14.jar:6.0.0]
    at org.apache.spark.deploy.worker.DseExecutorRunner$$anon$2.run(DseExecutorRunner.scala:80) [dse-spark-6.0.0.jar:6.0.0]
INFO  [dispatcher-event-loop-7] 2018-06-23 08:40:28,323 SPARK-WORKER Logging.scala:54 - Executor app-20180623083819-0000/212 finished with state FAILED message java.lang.IllegalStateException: Cannot find any build directories.
INFO  [dispatcher-event-loop-7] 2018-06-23 08:40:28,324 SPARK-MASTER Logging.scala:54 - Removing executor app-20180623083819-0000/212 because it is FAILED
INFO  [dispatcher-event-loop-0] 2018-06-23 08:40:30,288 SPARK-MASTER Logging.scala:54 - Received unregister request from application app-20180623083819-0000
INFO  [dispatcher-event-loop-0] 2018-06-23 08:40:30,292 SPARK-MASTER Logging.scala:54 - Removing app app-20180623083819-0000
INFO  [dispatcher-event-loop-0] 2018-06-23 08:40:30,295 SPARK-MASTER CassandraPersistenceEngine.scala:50 - Removing existing object 

Solution

  • Check the dse.yaml for directories in the /var/lib/... - you should have write access to them... For example, check that DSEFS directories are correctly configured, AlwaysOn SQL directories, etc.

    But in reality, issue should be fixed by starting DSE as dse cassandra -s -k instead of -sk...

    P.S. I'm using following script to point logs, etc. to the specific directories:

    export CASSANDRA_HOME=WHERE_YOU_EXTRACTED
    export DATA_BASE_DIR=SOME_DIRECTORY
    export DSE_DATA=${DATA_BASE_DIR}/data
    export DSE_LOGS=${DATA_BASE_DIR}/logs
    # set up where you want log so you don’t have to mess with logback.xml files
    export CASSANDRA_LOG_DIR=$DSE_LOGS/cassandra
    mkdir -p $CASSANDRA_LOG_DIR
    # so we don’t have to play with dse-spark-env.sh
    export SPARK_WORKER_DIR=$DSE_DATA/spark/worker
    # new setting in 6.0, in older versions set SPARK_LOCAL_DIRS
    export SPARK_EXECUTOR_DIRS=$DSE_DATA/spark/rdd
    export SPARK_LOCAL_DIRS=$DSE_DATA/spark/rdd
    mkdir -p $SPARK_LOCAL_DIRS
    export SPARK_WORKER_LOG_DIR=$DSE_DATA/spark/worker/
    export SPARK_MASTER_LOG_DIR=$DSE_DATA/spark/master
    # if you want to run the always on sql server
    export ALWAYSON_SQL_LOG_DIR=$DSE_DATA/spark/alwayson_sql_server
    export ALWAYSON_SQL_SERVER_AUDIT_LOG_DIR=$DSE_DATA/spark/alwayson_sql_server
    # so tomcat logs for solr goes to a place we know
    export TOMCAT_LOGS=$DSE_LOGS/tomcat
    
    PATH=${CASSANDRA_HOME}/bin:${CASSANDRA_HOME}/resources/cassandra/tools/bin:$PATH