Search code examples
rapache-sparkhivesparkr

SparkR on Windows - Spark SQL is not built with Hive support


I'm trying to use Spark localy in my machine and I was able to reproduce the tutorial at:

http://blog.sparkiq-labs.com/2015/07/26/installing-and-starting-sparkr-locally-on-windows-os-and-rstudio/

However, when I try to use Hive I get the following error:

Error in value[3L] : Spark SQL is not built with Hive support

The code:

## Set Environment variables
Sys.setenv(SPARK_HOME = 'F:/Spark_build')
# Set the library Path
.libPaths(c(file.path(Sys.getenv('SPARK_HOME'), 'R','lib'),.libPaths()))

# load  SparkR
library(SparkR)

sc <- sparkR.init()
sqlContext <- sparkRHive.init(sc)

sparkR.stop()

First I suspected that it was the pre-built version of Spark, then I tried to build my own using Maven, which took almost an hour:

mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package.

However, the error persists.


Solution

  • If you just followed the tutorial's instructions, you simply do not have Hive installed (try hive from the command line)... I have found that this is a common point of confusion for Spark beginners: "pre-built for Hadoop" does not mean that it needs Hadoop, let alone that it includes Hadoop (it does not), and the same holds for Hive.