Search code examples

(bdutil) Unable to get hadoop/spark cluster working with a fresh install

I'm setting up a tiny cluster in GCE to play around with it but although instances are created some failures prevent to get it working. I'm following the steps in

So far I'm using (as of now) lastest versions of gcloud (143.0.0) and bdutil (1.3.5), freshly installed.

./bdutil deploy -e extensions/spark/

using debian-8 as image (as bdutil still uses debian-7-backports).

At some point I got

Fri Feb 10 16:19:34 CET 2017: Command failed: wait ${SUBPROC} on line 326.
Fri Feb 10 16:19:34 CET 2017: Exit code of failed command: 1

full debug output is in (project id and bucket names changed)

Instances are created, but spark not even installed. Digging a bit I've managed to run spark installation and start hadoop commands in the master after after ssh. But it fails badly when starting the spark-shell:

17/02/10 15:53:20 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.4.5-hadoop1
17/02/10 15:53:20 INFO gcsio.FileSystemBackedDirectoryListCache: Creating '/hadoop_gcs_connector_metadata_cache' with createDirectories()...
java.lang.RuntimeException: java.lang.RuntimeException: java.nio.file.AccessDeniedException: /hadoop_gcs_connector_metadata_cache
    at org.apache.hadoop.hive.ql.session.SessionState.start(

and not able to import sparkSQL. For what I've read everything should be started automatically.

Up to this point I'm a bit lost and don't know what else to do. Am I missing any step? Is any of the commands faulty? Thanks in advance.

Update: solved

As pointed out in accepted solution I cloned the repo and cluster was created without issues. When trying to start the spark-shell though it gave

java.lang.RuntimeException: GoogleHadoopFileSystem has been closed or not initialized.`

That sounded to me like connectors were not initialized properly, so after running

 ./bdutil --env_var_files extensions/spark/, run_command_group install_connectors

it worked as expected.


  • The last version of bdutil on is a bit stale and I'd instead recommend using the version of bdutil at head on github: