Search code examples
rfor-loopsimilarity

Jaccard Similarity between strings using a for loop in R


I am trying to compute the jaccard similarity between each pair of names in large vectors of names (see below for small example) and to store their jaccard similarity in a matrix. My function is just returning NULL. What am I doing wrong?

library(dplyr)

df = data.frame(matrix(NA, ncol=3, nrow=3))
df = df %>%
    mutate_if(is.logical, as.numeric)

names(df) = c("A.J. Doyle", "A.J. Graham", "A.J. Porter")
draft_names = names(df) 
row.names(df) = c("A.J. Feeley", "A.J. McCarron", "Aaron Brooks")
quarterback_names = row.names(df)

library(stringdist)

jaccard_similarity = function(d){
  for (i in 1:nrow(d)){
    for(j in 1:ncol(d)){
      d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
    }
  }
}

df = jaccard_similarity(df)

Solution

  • You need to return your changed dataframe:

    jaccard_similarity = function(d){
      for (i in 1:nrow(d)){
        for(j in 1:ncol(d)){
          d[i,j] = stringdist(quarterback_names[i], draft_names[j], method ='jaccard', q=2)
        }
      }
      return(d)
      // ^^^
    }
    


    Afterwards jaccard_similarity(df) yields

                  A.J. Doyle A.J. Graham A.J. Porter
    A.J. Feeley    0.6428571   0.7500000   0.7500000
    A.J. McCarron  0.7647059   0.7777778   0.7777778
    Aaron Brooks   1.0000000   1.0000000   1.0000000