Search code examples
rlevenshtein-distance

using adist on two columns of data frame


I want to use adist to calculate edit distance between the values of two columns in each row.

I am using it in more-or-less this way:

A <- c("mad","car")
B <- c("mug","cat")
my_df <- data.frame(A,B)
my_df$dist <- adist(my_df$A, my_df$B, ignore.case = TRUE)
my_df <- my_df[order(dist),]

The last two rows are the same as in my case, but the actual data frame looks a bit different - columns of my original data frame are character type, not factor. Also, the dist column seems to be returned as 2-column matrix, I have no idea why it happens.

Update: I have read a bit and found that I need to apply it over the rows, so my new code is following:

apply(my_df, 1, function(d) adist(d[1], d[2]))

It works fine, but for my original dataset calling it by column numbers is inpractical, how can I refer to column names in this function?


Solution

  • You can overcome that problem by using mapply, i.e.

    mapply(adist, df$A, df$B)
    #[1] 2 1