I've been trying to Count Pattern Matching using Sparklyr.
I'm trying to count the number of time the pattern ";" appears in the variable room_number
Here is mytable :
room_number
A12;A19
A13
A15;A14;A20
When I don't use Sparklyr I can use this function:
count.matches <- function(pat, vec) sapply(regmatches(vec, gregexpr(pat, vec)), length)
mytable <- mytable %>%
mutate(number_pattern = mapply(count.matches, c(';'), list(room_number)))
I get:
room_number number_pattern
A12;A19 1
A13 0
A15;A14;A20 2
if I try to apply the code in distributed R with sparklyr using spark_apply
instead of mapply
, I get the following message :
mytable <- mytable %>%
+ mutate(number_pattern = spark_apply(count.matches, c(';'), list(room_number)))
glimpse(mytable)
Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "function"
Do you have any tips ? Thanks for helping me out
spark_apply
is a standalone function, and cannot be use in mutate
. Also it doesn't have the same API as mapply
:
count.matches <- function(pat) function(df) {
f <- function(vec) sapply(regmatches(vec, gregexpr(pat, vec)), length)
dplyr::mutate(df, number_pattern = f(room_number))
}
mytable %>% spark_apply(count.matches(";"))