Search code examples
pythonstringfor-loopsimilarity

How to Determine how similar two strings are (until a certain point)


I have a list of strings ['49275', '49287', '69674', '43924']

I want to see how similar they are to a certain value (lets say '49375' BUT once there is a difference, everything past the difference needs to be counted as NOT similar (even if they are)

So '49375' and '49275' should have a similarity of 0.4 NOT 0.8

I tried the code below but I am getting stumped and there must be a better way.

l = ['49275', '49287', '69674', '43924']
x = '49375'

listy = []
for i in l:
  for n in range(len(x)):
    if x[n] == i[0][n]:
      listy.append((n+1)/len(x))
    if x[n] != i[0][n]:
      break

I would like the output to be a list of similarity numbers, i.e: [0.4, 0.4, 0, 0.2]

Thank You!


Solution

  • You were close. You just want to append to listy at the point where the characters do not match (i.e. before the break), or if the loop completes without a break then append 1.0.

    Note also that you want i[n] rather than i[0][n] - the i[0][n] was giving you an IndexError because you were taking the first character and then trying to take character n from that.

    l = ['49275', '49287', '69674', '43924']
    x = '49375'
    
    listy = []
    for i in l:
        for n in range(len(x)):
            if x[n] != i[n]:
                listy.append(n / len(x))
                break
        else:
            listy.append(1.)
    
    print(listy)