How to Compare strings of 1 column with strings of another within the same dataframe, calculate the percentage of strings matching in result columns, as well as whether they are full matches, partial matches, or don't match at all?
Try This:
import pandas as pd
table = {
'Column_01':['Apple', 'Mango', 'Banana','Coconut','Pineaple','Guava'],
'Column_02':['Apple','Man','Fruits','Cocon','Pin','Guava']
}
tf1 = pd.DataFrame(table)
print(tf1)
print(f'\n\n-------------BREAK-----------\n\n')
def func(x):
len_col1 = len(x[0])
cont = 0
for y in range (0, len_col1):
try:
if x[0][y] == x[1][y]:
cont += 1
else:
pass
except:
break
return round(((cont* 100)/len_col1),2)
def func2(x):
if x == 0:
return 'Not match at all'
elif x < 100:
return 'Partial match'
else:
return 'Full match'
tf1['% of Matching Stings'] = tf1.apply(func, axis = 1)
tf1['Status'] = tf1['% of Matching Stings'].apply(func2)
print(tf1)
OUTPUT
Column_01 Column_02
0 Apple Apple
1 Mango Man
2 Banana Fruits
3 Coconut Cocon
4 Pineaple Pin
5 Guava Guava
-------------BREAK-----------
Column_01 Column_02 % of Matching Stings Status
0 Apple Apple 100.00 Full match
1 Mango Man 60.00 Partial match
2 Banana Fruits 0.00 Not match at all
3 Coconut Cocon 71.43 Partial match
4 Pineaple Pin 37.50 Partial match
5 Guava Guava 100.00 Full match