Search code examples
pythonnlpsequencesentencedifflib

Match the content of sentence/sequence in python


Suppose we 2 sequence of words

sentence1 = 'Ram is eating'

sentence2 = 'is Ram  eating'

sentence3 = 'is Ram playing'

sentence4 = 'movie Ram watching is'

how to get match% of such 2 sequences . difflib sequenceMatcher matches letter by letter . Any way to find match % in these cases.

match% between sentence1 and sentence2 = 3/3 i.e. 100%

match% between sentence1 and sentence3 = 2/3 i.e. 66.66%

match% between sentence1 and sentence4 = 2/3 i.e. 66.66%

match% = (number of words matching in sentence1 and sentence2 irrespective of position/total number of words in sentence1)*100

Solution

  • How about converting string to list and find matching percentage.

    sentence1 = 'Ram is eating'
    sentence2 = 'is Ram  eating'
    
    sentence1 = sentence1.split()
    sentence2 = sentence2.split()
    
    longest = max(sentence1, sentence2, key=len)
    
    per = len(set(sentence1) & set(sentence2))  
    result = per/len(longest)
    print (f'{result *100}% matched')
       
    

    Gives #

    100.0% matched
    

    Case 2

    sentence1 = 'Ram is eating' 
    sentence2 = 'is Ram'
    
    sentence1 = sentence1.split()
    sentence2 = sentence2.split()
    longest = max(sentence1, sentence2, key=len)
    
    per = len(set(sentence1) & set(sentence2))
    result = per/len(longest)
    print (f'{result *100}% matched')
           
    

    Gives #

    66.66666666666666% matched
    

    Case 3

    sentence1 = 'Ram is eating'
    
    sentence2 = 'is Ram'
    sentence3 = 'is Ram playing'
    
    sentence1 = sentence1.split()
    sentence2 = sentence2.split()
    sentence3 = sentence3.split()
    
    
    longest = max(sentence1, sentence3, key=len)
    
    per = len(set(sentence1) & set(sentence3)) 
    result = per/len(longest)
    print (f'{result *100}% matched')
    

    Gives #

    66.66666666666666% matched