I am trying to calculate Levenshtein distance in a dataframe using for loop.
df2_2=df2_1[['Concat','Count','ffour']].copy()
for a in df2_2['Concat'].unique():
dw2_2=df2_2[df2_2['Concat']==a]
vv = dw2_2.iloc[:, 1::2].values
iRow, iCol = np.unravel_index(vv.argmax(), vv.shape)
iCol = iCol * 2 + 1
result = dw2_2.iloc[iRow, [0, iCol, iCol + 1]]
b=result.copy()
b=b.drop(labels=['Concat','Count'])
print (b)
b=b.astype(str)
for a1 in df2_2['ffour'].unique():
dw2_1=df2_2[df2_2['ffour']==a1]
c= dw2_1['ffour'].copy()
print (c)
c=c.astype(str)
for i in range (len(b)):
distance=lev.distance(b,c)
print (distance)
ratio=lev.ratio(b,c)
print (ratio)
i am getting an error in this
File "<ipython-input-129-15900bf3d493>", line 17, in <module>
distance=lev.distance(b,c)
TypeError: distance expected two Strings or two Unicodes
Need help on this.
I will you suggest you to check the values of both b
and c
.
You can always just use str(b)
, and str(c)
and it might do the trick.
Like that:
distance=lev.distance(str(b),str(c))
Or you can just apply str() on all the values in column Concat
, to ensure that you will have only strings:
df2_2['Concat'] = df2_2['Concat'].map(lambda x: str(x))