I'm trying to run sparklyr from my local environment to replicate a production environment. However, I can't even get started. I successfully installed the latest version of Spark using spark_install(), but when trying to run spark_connect() I get this vague and unhelpful error.
> library(sparklyr)
> spark_installed_versions()
spark hadoop dir
1 2.3.1 2.7 C:\\Users\\...\\AppData\\Local/spark/spark-2.3.1-bin-hadoop2.7
> spark_connect(master = "local")
Error in if (is.na(a)) return(-1L) : argument is of length zero
Here is what my session info looks like.
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sparklyr_0.8.4.9003
loaded via a namespace (and not attached):
[1] Rcpp_0.12.17 dbplyr_1.2.1 compiler_3.5.0 pillar_1.2.3 later_0.7.3
[6] plyr_1.8.4 bindr_0.1.1 base64enc_0.1-3 tools_3.5.0 digest_0.6.15
[11] jsonlite_1.5 tibble_1.4.2 nlme_3.1-137 lattice_0.20-35 pkgconfig_2.0.1
[16] rlang_0.2.1 psych_1.8.4 shiny_1.1.0 DBI_1.0.0 rstudioapi_0.7
[21] yaml_2.1.19 parallel_3.5.0 bindrcpp_0.2.2 stringr_1.3.1 dplyr_0.7.5
[26] httr_1.3.1 rappdirs_0.3.1 rprojroot_1.3-2 grid_3.5.0 tidyselect_0.2.4
[31] glue_1.2.0 R6_2.2.2 foreign_0.8-70 reshape2_1.4.3 purrr_0.2.5
[36] tidyr_0.8.1 magrittr_1.5 backports_1.1.2 promises_1.0.1 htmltools_0.3.6
[41] assertthat_0.2.0 mnormt_1.5-5 mime_0.5 xtable_1.8-2 httpuv_1.4.3
[46] config_0.3 stringi_1.1.7 lazyeval_0.2.1 broom_0.4.4
Well, with a bit of guessing I was able to solve my problem. I had to specify the "SPARK_HOME" environment manually.
spark_installed_versions()[1, 3] %>% spark_home_set()