Search code examples
similarityjaro-winkler

Jaro Similarity


For finding Jaro similarity I found the matching charecters as follows

matching charecters in string 1 :  AABABCAAAC   
matching charecters in string 2 :  ABAACBAAAC

what is the value of t(0.5*transpositions)? (source: wikipedia)


Solution

  • Transpositions in this context are all those characters that don't match the same position on strings

    from wikipedia

    m = 10
    t = 4/2 = 2
    |S1| = 10
    |S2| = 10
    d = 1/3 * (10/10 + 10/10 + (10-2)/10) = 0.933
    

    these transposition are [A/B, B/A, B/C, C/B] so t is calculed with |[A/B, B/A, B/C, C/B]| / 2.