Search code examples
pythonpandasnlpfuzzywuzzydifflib

How to Compare strings of 1 column with strings of another within the same dataframe, calculate the percentage of strings matching in result columns


How to Compare strings of 1 column with strings of another within the same dataframe, calculate the percentage of strings matching in result columns, as well as whether they are full matches, partial matches, or don't match at all?

enter image description here


Solution

  • Try This:

    import pandas as pd
    table = {
        'Column_01':['Apple', 'Mango', 'Banana','Coconut','Pineaple','Guava'],
        'Column_02':['Apple','Man','Fruits','Cocon','Pin','Guava']
               }
    tf1 = pd.DataFrame(table)
    
    
    print(tf1)
    
    print(f'\n\n-------------BREAK-----------\n\n')
    
    
    def func(x):
        len_col1 = len(x[0])
        cont = 0
        for y in range (0, len_col1):
            try:
                if x[0][y] == x[1][y]:
                    cont += 1
                else:
                    pass
            except:
                break
        return round(((cont* 100)/len_col1),2)
    
    def func2(x):
        if x == 0:
            return 'Not match at all'
        elif x < 100:
            return 'Partial match'
        else:
            return 'Full match'
    
    tf1['% of Matching Stings'] = tf1.apply(func, axis = 1) 
    
    tf1['Status'] = tf1['% of Matching Stings'].apply(func2)
    
    print(tf1)
    

    OUTPUT

      Column_01 Column_02
    0     Apple     Apple
    1     Mango       Man
    2    Banana    Fruits
    3   Coconut     Cocon
    4  Pineaple       Pin
    5     Guava     Guava
    
    
    -------------BREAK-----------
    
    
      Column_01 Column_02  % of Matching Stings            Status
    0     Apple     Apple                100.00        Full match
    1     Mango       Man                 60.00     Partial match
    2    Banana    Fruits                  0.00  Not match at all
    3   Coconut     Cocon                 71.43     Partial match
    4  Pineaple       Pin                 37.50     Partial match
    5     Guava     Guava                100.00        Full match