Search code examples
pythonpython-3.xnlpcosine-similarity

NLP - Find Similar/Phonetic word and calculate score in a paragraph


I'm developing a simple NLP project, where we have given a set of words and to find the similar/phonetically similar word from a text. I've found a lot of algorithms but not a sample application.

Also it should give the similarity score by comparing the keyword and the word that are found.

Can anyone help me out?

    def word2vec(word):
    from collections import Counter
    from math import sqrt

    cw = Counter(word)
    sw = set(cw)
    lw = sqrt(sum(c*c for c in cw.values()))
    return cw, sw, lw

def cosdis(v1, v2):
    common = v1[1].intersection(v2[1])
    return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]

list_A = ['e-commerce', 'ecomme', 'e-commercy', 'ecomacy', 'E-Commerce']
list_B = ['E-Commerce']

for word in list_A:
    for key in list_B:
            res = cosdis(word2vec(word), word2vec(key))
            print(res)

This code only does word to word comparison.

Can anyone help me out?


Solution

  • I think you are referring to something like an API that could first convert word into IPA symbols (a form of phonetic notation) and you then compare the IPA symbols.

    from collections import Counter
    from math import sqrt
    import eng_to_ipa as ipa
    
    def word2vec(word):
        cw = Counter(word)
        sw = set(cw)
        lw = sqrt(sum(c*c for c in cw.values()))
        return cw, sw, lw
    
    def cosdis(v1, v2):
        common = v1[1].intersection(v2[1])
        return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]
    
    list_A = ['e-commerce', 'ecomme', 'e-commercy', 'ecomacy', 'E-Commerce']
    list_B = ['E-Commerce']
    
    IPA_list_a = []
    IPA_list_b = []
    for each in list_A:
        IPA_list_a.append(ipa.convert(each))
    for each in list_B:
        IPA_list_b.append(ipa.convert(each))
    
    for word in IPA_list_a:
        for key in IPA_list_b:
                res = cosdis(word2vec(word), word2vec(key))
                print(res)
    

    Check this out : [https://github.com/mphilli/English-to-IPA][1]

    >>> import eng_to_ipa as ipa
    >>> ipa.convert("The quick brown fox jumped over the lazy dog.")
    'ðə kwɪk braʊn fɑks ʤəmpt ˈoʊvər ðə ˈleɪzi dɔg.'
    

    Example is founded from the above github link.