Search code examples
rapache-sparksparkrsparklyr

Syntax to calculate number of characters Spark R


In R we use str_length() for the same, what is the syntax in Spark R for a similar operation.

R syntax

str_length(10)
2
str_length(9)
1

Solution

  • df <- data.frame(x = c("abc", "abcdef"))
    

    In SparkR, use length.

    library(SparkR)
    
    df_sparkr <- createDataFrame(df)
    head(select(df_sparkr, length(df_sparkr$x)))
    #>   length(x)
    #> 1         3
    #> 2         6
    

    In sparklyr, use nchar

    library(tidyverse)
    library(sparklyr)
    
    sc <- sparklyr::spark_connect(method = "local")
    
    df_sparklyr <- copy_to(sc, df)
    
    df_sparklyr %>%
      mutate(length = nchar(x))
    #> # Source: spark<?> [?? x 2]
    #>   x      length
    #>   <chr>   <int>
    #> 1 abc         3
    #> 2 abcdef      6