Search code examples
rhadoopapache-sparksparkrsparklyr

Install Spark on Windows for sparklyr


I have tried several tutorials on setting up Spark and Hadoop in a Windows environment, especially alongside R. This one resulted in this error by the time I hit figure 9:

enter image description here

This tutorial from Rstudio is giving me issues as well. When I get to the

sc <- spark_connect(master = "local")

step, I get this familiar error:

Error in force(code) : 
  Failed while connecting to sparklyr to port (8880) for sessionid (1652): Gateway in port (8880) did not respond.
    Path: C:\Users\jvangeete\spark-2.0.2-bin-hadoop2.7\bin\spark-submit2.cmd
    Parameters: --class, sparklyr.Backend, "C:\Users\jvangeete\Documents\R\win-library\3.3\sparklyr\java\sparklyr-2.0-2.11.jar", 8880, 1652


---- Output Log ----
The system cannot find the path specified.

---- Error Log ----

This port issue is similar to the one I get when trying to assign the "yarn-client" parameter inside spark_connect(...) as well, when trying it from Ms. Zaidi's tutorial, here. (That tutorial has its own issues, which I've put up on a board, here, if anyone's interested.)

The TutorialsPoint walkthrough gets me through fine if I first install an Ubuntu VM, but I'm using Microsoft R(RO) so I'd like to figure this out in Windows, not least of all because it appears that Mr. Emaasit is in the first tutorial able to run a command I cannot with .\bin\sparkR.

Most generally I am trying to understand how to install and run Spark together with R using preferably sparklyr, in Windows.

UPDATE 1: This is what's in the directories:

enter image description here

UPDATE 2: This is my R-session and system info

platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          3.1                         
year           2016                        
month          06                          
day            21                          
svn rev        70800                       
language       R                           
version.string R version 3.3.1 (2016-06-21)
nickname       Bug in Your Hair   

enter image description here


Solution

    1. Download spark_hadoop tar from http://spark.apache.org/downloads.html
    2. install sparklyr package from carn
    3. spark_install_tar(tarfile = "path/to/spark_hadoop.tar")

    If you still getting error, then untar the tar manually and set spark_home environment variable points to spark_hadoop untar path.

    Then try executing the following in the R console. library(sparklyr) sc <- spark_connect(master = "local").