Search code examples
scalacassandrahiveapache-sparkshark-sql

Has anyone been successful running Apache Spark & Shark on Cassandra


I am trying to configure a 5 node cassandra cluster to run Spark/Shark to test out some Hive queries. I have installed Spark, Scala, Shark and configured according to Amplab [Running Shark on a cluster] https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster.

I am able to get into the Shark CLI and when I try to create an EXTERNAL TABLE out of one of my Cassandra ColumnFamily tables, I keep getting this error

Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.cassandra.CassandraStorageHandler

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I have configured HIVE_HOME, HADOOP_HOME, SCALA_HOME. Perhaps I'm pointing HIVE_HOME and HADOOP_HOME to the wrong paths? HADOOP_HOME is set to my Cassandra hadoop folder (/etc/dse/cassandra), HIVE_HOME is set to the unpacked Amplad download of Hadoop1/hive, and I have also set HIVE_CONF_DIR to my Cassandra Hive path (/etc/dse/hive). Am I missing any steps? Or have I configured these locations wrongly? Any ideas please? Any help will be very much appreciated. Thanks


Solution

  • Yes, I have got it.

    Try https://github.com/2013Commons/hive-cassandra

    whick is working with cassandra 2.0.4, hive 0.11, hadoop 2.0