I am trying out SparkR with RStudio, but it doesn't seem to work. I have tried the suggested solutions on other questions, but I still can't figure out why it isn't running.
The code I am running is as follows
if (nchar(Sys.getenv("SPARK_HOME")) < 1) {
Sys.setenv(SPARK_HOME = "c://spark")
}
library(SparkR)
library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sc<-sparkR.session(master="spark://192.168.56.1:7077",appName = "R Spark", sparkConfig = list(spark.cassandra.connection.host="localhost"), sparkPackages = "datastax:spark-cassandra-connector:1.6.0-s_2.11")
df<- as.DataFrame(faithful)
showDF(df)
The message I get is
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 192.168.56.1): java.io.IOException: Cannot run program "Rscript":
CreateProcess error=2, Das System kann die angegebene Datei nicht finden
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.r.RRunner$.createRProcess(RRunner.scala:348)
at org.apache.spark.api.r.RRunner$.createRWorker(RRunner.scala:386)
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:69)
at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:50)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.
I am trying to run it on a stand-alone cluster with 1 Worker,
Spark Version is 2.0.2
RStudio: 1.0.136
R: 3.3.2
I was having a similar problem under RStudio with a 2 node cluster.
The issue is that while your R driver program has R installed, your worker node doesn't (or at least doesn't have Rscript in its execution path). As a result, when it tries to run a bit of R code on the worker instead of the master, it fails to find Rscript.
Solution: install R and Rscript on your worker node.
I hope this helps!