I am trying to use sparklyr with arrow to increase performance as seen for example here, however running into errors.
Here is a (hopefully) reproducible example:
# Prepare session and data
library(sparklyr)
library(dplyr)
config <- sparklyr::spark_config()
sc <- sparklyr::spark_connect(master = "local", config = config)
mtcars_sp <- dplyr::copy_to(sc, datasets::mtcars, overwrite = TRUE)
Using sparklyr without arrow works fine:
if ("arrow" %in% .packages()) detach("package:arrow")
mtcars_sp %>% sparklyr::spark_apply(function(df) df) %>% collect()
However, adding arrow to the mix and running the same produces errors:
library(arrow)
mtcars_sp %>% sparklyr::spark_apply(function(df) df) %>% collect()
The error message does not seem too helpful, but looking at the worker log I see:
ERROR sparklyr: RScript (6891) terminated unexpectedly: object 'as_tibble' not found
Relevant sessioninfo:
There's a newer version of sparklyr
available, 1.0.2. It looks like there are some changes in that release that are needed to work with arrow
0.14.x. sparklyr
's continuous integration with the latest version of arrow
is passing.