Search code examples
rlevenshtein-distance

Smartest way to double loop over a data frame (comparing rows to each other with a Levenshtein Dist) in R?


I cooked a df of paramStrings over several records:

             idName                                          Str
1         Аэрофлот_Эконом 95111000210102121111010100111000100110101001
2        Аэрофлот_Комфорт 95111000210102121111010100111000100110101001
3         Аэрофлот_Бизнес 96111000210102121111010100111000100110101001
4       Трансаэро_Дисконт 26111000210102120000010100001010000010001000
5 Трансаэро_Туристический 26111000210002120000010100001010000010001000
6        Трансаэро_Эконом 26111000210002120000010100001010000010001000

Now I need to compare each one against others with a levenshtainDist, which works as a function(str1,str2), so I need obviously double loop for that. However, I am pretty sure there shall be a neat vectorised (apply/lapply/sapply) way of doing that, however I couldn't find any similar solutions...


Solution

  • The function adist computes a generalized Levenshtein distance. Is that what you need?

    Assuming you have your data in a data.frame, using: adist(mydf$Str) will return a matrix with the distances between each pair of the Str column.