I have a dataframe with only one column , and 1000 rows in that column. I need to compare all rows and find Levenshtein distance for all rows . how Do i calculate that ratio or distance in python
I have a dataframe as following:
#Df
StepDescription
click confirm button when done
you have logged on
please log in to proceed
click on confirm button
Dolb was released successfully
Enter your details
validate the statement
Aval was released sucessfully
How to do i Calculate Levenshtein ration for all these
Code I have written to iterate through loops but after iterating how to proceed.
import Levenshtein
import pandas as pd
data_dist = pd.read_csv('path\Data_TestDescription.csv')
df = pd.DataFrame(data_dist)
for index, row in df.iterrows():
As asked in a comment, the percentage is desired, I'll keep the accepteds answer and add just the new part:
import numpy as np
import pandas as pd
from Levenshtein import distance
from itertools import product
#df = ...
dist = [distance(*x) for x in product(df.StepDescription, repeat=2)]
dist_df = pd.DataFrame(np.array(dist).reshape(df.shape[0], df.shape[0]))
dist_df
0 1 2 3 4 5 6 7
0 0 23 23 13 29 25 25 28
1 23 0 18 18 23 18 18 23
2 23 18 0 20 25 21 19 24
3 13 18 20 0 27 19 21 26
4 29 23 25 27 0 26 23 5
5 25 18 21 19 26 0 19 25
6 25 18 19 21 23 19 0 21
7 28 23 24 26 5 25 21 0
dist_df_percentage = dist_df // min(x for x in dist if x > 0) * 100
0 1 2 3 4 5 6 7
0 0 460 460 260 580 500 500 560
1 460 0 360 360 460 360 360 460
2 460 360 0 400 500 420 380 480
3 260 360 400 0 540 380 420 520
4 580 460 500 540 0 520 460 100
5 500 360 420 380 520 0 380 500
6 500 360 380 420 460 380 0 420
7 560 460 480 520 100 500 420 0