Search code examples
rstring-matchingfuzzy-comparison

What is the best method for fuzzy matching all elements of a single vector or column against all the elements within that same vector or column?


For example, if I had a data.frame such as

df <- data.frame(Name = 'Chris','Christopher','John','Jon','Jonathan')

Is there a way for me to build a similarity matrix comparing how similar each individual name is to every other name in the 'Name' column?

I've tried using loop but not really sure how to apply this across the entire column

for(i in 1:nrow(df)){
  df$distance[i] <- adist(df$Name[i], df$Name[i+1])
}

Solution

  • I got @zephryl 's solution to work with some minor edits.

    df <- data.frame('Name' = c('Chris','Christopher','John','Jon','Jonathan'))
    
    distances <- adist(df$Name)
    distances <- as.data.frame(distances)
    rownames(distances) <- df$Name
    colnames(distances) <- df$Name
    
    distances