Lets say i have 2 excel files each containing a column of names and dates
Excel 1:
Name
0 Bla bla bla June 04 2018
1 Puppy Dog June 01 2017
2 Donald Duck February 24 2017
3 Bruno Venus April 24 2019
Excel 2:
Name
0 Pluto Feb 09 2019
1 Donald Glover Feb 22 2020
2 Dog Feb 22 2020
3 Bla Bla Feb 22 2020
I want to match each cell from column 1 to each cell in column 2 and then locate the biggest similarity.
The following function will give a percentage value of how much two input match each other.
SequenceMatcher code example:
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
x = "Adam Clausen a Feb 09 2019"
y = "Adam Clausen Feb 08 2019"
print(similar(x,y))
Output:0.92
If u know how to load colums as dataframe..this code should get your job done..
from difflib import SequenceMatcher
col_1 = ['potato','tomato', 'apple']
col_2 = ['tomatoe','potatao','appel']
def similar(a,b):
ratio = SequenceMatcher(None, a, b).ratio()
matches = a, b
return ratio, matches
for i in col_1:
print(max(similar(i,j) for j in col_2))