Search code examples
rsparklyrapache-arrow

Sparklyr R with apache arrow fails, terminated unexpectedly: object 'as_tibble' not found


I am trying to use sparklyr with arrow to increase performance as seen for example here, however running into errors.

Here is a (hopefully) reproducible example:

# Prepare session and data
library(sparklyr)
library(dplyr)
config <- sparklyr::spark_config()
sc <- sparklyr::spark_connect(master = "local", config = config)
mtcars_sp <- dplyr::copy_to(sc, datasets::mtcars, overwrite = TRUE)

Using sparklyr without arrow works fine:

if ("arrow" %in% .packages()) detach("package:arrow")
mtcars_sp %>% sparklyr::spark_apply(function(df) df) %>% collect()

However, adding arrow to the mix and running the same produces errors:

library(arrow)
mtcars_sp %>% sparklyr::spark_apply(function(df) df) %>% collect()

The error message does not seem too helpful, but looking at the worker log I see:

ERROR sparklyr: RScript (6891) terminated unexpectedly: object 'as_tibble' not found

Relevant sessioninfo:

  • R version 3.6.0, x86_64-redhat-linux-gnu (64-bit)
  • Packages: arrow_0.14.1, dplyr_0.8.3, sparklyr_1.0.1
  • Spark version 2.4.3

Solution

  • There's a newer version of sparklyr available, 1.0.2. It looks like there are some changes in that release that are needed to work with arrow 0.14.x. sparklyr's continuous integration with the latest version of arrow is passing.