Search code examples
rstringlevenshtein-distance

Trying to find a way to use adist() for words instead of characters in R


I'd like for the adist function to work the same way it does for words as it does for characters. What I mean by this is I'd like a deletion/substitution/insertion to apply to a whole word instead of characters. For example, I want "Alert 12 went off at 3am" and "Alert 17 was heard at 3am" to have a Levenshtein Distance of 3 because there are three substitutions of words needed to get from one string to another. Thanks


Solution

  • I guess you can try the following code to count different words

    library(vecsets)
    d <- length(vsetdiff(unlist(strsplit(s1," ")),unlist(strsplit(s2," "))))
    

    such that

    > d
    [1] 3
    

    DATa

    s1 <- "Alert 12 went off at 3am"
    s2 <- "Alert 17 was heard at 3am"