I want to use adist to calculate edit distance between the values of two columns in each row.
I am using it in more-or-less this way:
A <- c("mad","car")
B <- c("mug","cat")
my_df <- data.frame(A,B)
my_df$dist <- adist(my_df$A, my_df$B, ignore.case = TRUE)
my_df <- my_df[order(dist),]
The last two rows are the same as in my case, but the actual data frame looks a bit different - columns of my original data frame are character type, not factor. Also, the dist column seems to be returned as 2-column matrix, I have no idea why it happens.
Update: I have read a bit and found that I need to apply it over the rows, so my new code is following:
apply(my_df, 1, function(d) adist(d[1], d[2]))
It works fine, but for my original dataset calling it by column numbers is inpractical, how can I refer to column names in this function?
You can overcome that problem by using mapply
, i.e.
mapply(adist, df$A, df$B)
#[1] 2 1