Search code examples
rsparklyrstringdist

How to calculate distance between strings using sparklyr?


I need to calculate the distance between two strings in R using sparklyr. Is there a way of using stringdist or any other package? I wanted to use cousine distance. This distance is used as a method of stringdist function.

Thanks in advance.


Solution

  • You can use built-in levenshtein function:

    df <- copy_to(sc, data.frame(a=c("This is it", "Foo"), b=c("This is", "foobar)))
    
    # df %>% mutate(dist = levenshtein(a, b))
    # # Source:   lazy query [?? x 3]
    # # Database: spark_connection
    #   a          b        dist
    #   <chr>      <chr>   <int>
    # 1 This is it This is     3
    # 2 Foo        foobar      4