I have tried several tutorials on setting up Spark and Hadoop in a Windows environment, especially alongside R. This one resulted in this error by the time I hit figure 9:
This tutorial from Rstudio is giving me issues as well. When I get to the
sc <- spark_connect(master = "local")
step, I get this familiar error:
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (1652): Gateway in port (8880) did not respond.
Path: C:\Users\jvangeete\spark-2.0.2-bin-hadoop2.7\bin\spark-submit2.cmd
Parameters: --class, sparklyr.Backend, "C:\Users\jvangeete\Documents\R\win-library\3.3\sparklyr\java\sparklyr-2.0-2.11.jar", 8880, 1652
---- Output Log ----
The system cannot find the path specified.
---- Error Log ----
This port issue is similar to the one I get when trying to assign the "yarn-client"
parameter inside spark_connect(...)
as well, when trying it from Ms. Zaidi's tutorial, here. (That tutorial has its own issues, which I've put up on a board, here, if anyone's interested.)
The TutorialsPoint walkthrough gets me through fine if I first install an Ubuntu VM, but I'm using Microsoft R(RO) so I'd like to figure this out in Windows, not least of all because it appears that Mr. Emaasit is in the first tutorial able to run a command I cannot with .\bin\sparkR
.
Most generally I am trying to understand how to install and run Spark together with R using preferably sparklyr, in Windows.
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 3.1
year 2016
month 06
day 21
svn rev 70800
language R
version.string R version 3.3.1 (2016-06-21)
nickname Bug in Your Hair
If you still getting error, then untar the tar manually and set spark_home environment variable points to spark_hadoop untar path.
Then try executing the following in the R console. library(sparklyr) sc <- spark_connect(master = "local").