Search code examples
pythonpandaslevenshtein-distance

How to calculate Levenshtein distance for every unique value using a for loop on a dataframe in pandas


I am trying to calculate Levenshtein distance in a dataframe using for loop.

df2_2=df2_1[['Concat','Count','ffour']].copy()
for a in df2_2['Concat'].unique():
    dw2_2=df2_2[df2_2['Concat']==a]
    vv = dw2_2.iloc[:, 1::2].values
    iRow, iCol = np.unravel_index(vv.argmax(), vv.shape)
    iCol = iCol * 2 + 1
    result = dw2_2.iloc[iRow, [0, iCol, iCol + 1]]
    b=result.copy()
    b=b.drop(labels=['Concat','Count'])
    print (b)
    b=b.astype(str)
    for a1 in df2_2['ffour'].unique():
        dw2_1=df2_2[df2_2['ffour']==a1]
        c= dw2_1['ffour'].copy()
        print (c)
        c=c.astype(str)
        for i in range (len(b)):
            distance=lev.distance(b,c)
            print (distance)
            ratio=lev.ratio(b,c)
            print (ratio)

i am getting an error in this

  File "<ipython-input-129-15900bf3d493>", line 17, in <module>
    distance=lev.distance(b,c)

TypeError: distance expected two Strings or two Unicodes

Need help on this.


Solution

  • I will you suggest you to check the values of both b and c. You can always just use str(b), and str(c) and it might do the trick.
    Like that:

    distance=lev.distance(str(b),str(c))
    

    Or you can just apply str() on all the values in column Concat, to ensure that you will have only strings:

    df2_2['Concat'] = df2_2['Concat'].map(lambda x: str(x))