I've worked in RStudio on a local device for a couple of years and I recently started working with Spark (version 3.0.1). I ran into an unexpected problem when I tried to run stringr::str_detect()
in Spark. Apparently str_detect()
does not have an equivalent in SQL. I am looking for an alternative, preferably in R.
Here is an example of my expected result when running str_detect()
locally vs. in Spark.
# Load packages
library(dplyr)
library(stringr)
library(sparklyr)
# Example tibble
df <- tibble(foodtype = c("potatosalad", "potato", "salad"))
df
---
# A tibble: 3 x 1
foodtype
<chr>
1 potatosalad
2 potato
3 salad
---
# Expected result when using R
df %>%
mutate(contains_potato = str_detect(foodtype, "potato"))
---
# A tibble: 3 x 2
foodtype contains_potato
<chr> <lgl>
1 potatosalad TRUE
2 potato TRUE
3 salad FALSE
---
But when I run this code on a Spark dataframe it returns the following error message: "Error: str_detect() is not available in this SQL variant".
# Connect to local Spark cluster
sc <- spark_connect(master = "local", version = "3.0")
# Copy tibble to Spark cluster
df_spark <- copy_to(sc, df)
df_spark
# Error when using str_detect with Spark
df_spark %>%
mutate(contains_potato = str_detect(foodtype, "potato"))
---
Error: str_detect() is not available in this SQL variant
---
str_detect()
is equivalent to Spark's rlike
function.
I don't use spark with R but something like this should work:
df_spark %>% mutate(contains_potato = foodtype %rlike% "potato")
dplyr
accepts Spark functions written as R functions when there is no dplyr equivalent:
df_spark %>% mutate(contains_potato = rlike(foodtype, "potato"))