I want to compute the Levenshtein distance between two arbitrary sequences.
a <- 1:100
b <- c(1, 1:100)
edit_distance(a, b) == 1
I am aware of the adist
function and the stringdist
package, but they only work on character vectors. If the number of symbols in the sequences were small, I could just encode them as characters and use the above functions.
But there will typically be on the order of 1000 different symbols. Another option would be to encode them as Unicode characters (adist
works on them: adist("\U00001", "\U00001\U00002")
), but I don't know how to do this.
You can use intToUtf8
to map your integers to Unicode characters:
a2 <- intToUtf8(a)
b2 <- intToUtf8(b)
adist(a2, b2)
# [,1]
# [1,] 1