Search code examples
pythonstringcharacteredit-distance

Fast method to quantify number of character transpositions between strings (python or libraries within)


Consider we have two strings:

    x1 = "abcdef"
    x2 = "abdcfe"

    x1 == x2 # return False

My goal is to find how many transpositions there are between these two string having the same characters. In the above example there's 2 or 4 depending on how you view it (still an even number so either way works). Another way is to sort the characters in the string then comparing like so:

    x1s = ''.join(sorted(x1)) # 'abcdef'
    x2s = ''.join(sorted(x2)) # 'abcdef'

    x1s  == x2s # returns True of course

This way, loses the quantity of transpositions. Can't think how plain Levenshtein can help with this because the number of edits using other than the same characters available have the same weight. e.g.

    #pip install python-Levenshtein # you'll need this
    from Levenshtein import distance

    distance(x1, x2) # gives 3
    distance(x1s, x2s) # gives 0

Any ideas?


Solution

  • Alright, worked out an answer and here it is:

        len([i for i, j in zip(x1, x2) if i != j])
    

    And this returns the transposition count.