Search code examples
rapache-sparksparklyr

Count Pattern Matching using Sparklyr


I've been trying to Count Pattern Matching using Sparklyr.

I'm trying to count the number of time the pattern ";" appears in the variable room_number

Here is mytable :

room_number      
A12;A19        
A13            
A15;A14;A20 

When I don't use Sparklyr I can use this function:

count.matches <- function(pat, vec) sapply(regmatches(vec, gregexpr(pat, vec)), length)

mytable <- mytable %>%
mutate(number_pattern = mapply(count.matches, c(';'), list(room_number)))

I get:

room_number    number_pattern    
A12;A19        1
A13            0
A15;A14;A20    2

if I try to apply the code in distributed R with sparklyr using spark_apply instead of mapply, I get the following message :

mytable  <- mytable  %>%
+   mutate(number_pattern = spark_apply(count.matches, c(';'), list(room_number)))
glimpse(mytable)

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "function"

Do you have any tips ? Thanks for helping me out


Solution

  • spark_apply is a standalone function, and cannot be use in mutate. Also it doesn't have the same API as mapply:

    count.matches <- function(pat) function(df) {
      f <- function(vec) sapply(regmatches(vec, gregexpr(pat, vec)), length)
      dplyr::mutate(df, number_pattern = f(room_number))
    }
    
    mytable %>% spark_apply(count.matches(";"))