Search code examples
searchnltktextblobnlp

How do I create a search using NLP techniques which searches an inputted named entity as well as any potential name variations it may have?


I’m currently using TextBlob to make a chatbot, and I’ve so far been extracting named entities using noun phrase extraction and finding the pos tag NNP. When entering a test user question such as ‘Will Smith’s latest single?’, I am correctly retrieving ‘Will Smith’. But I want to be able to search not only ‘will smith’ but ‘william smith’ ‘bill smith’ ‘willie smith’ ‘billy smith’ - basically other popularly known variations of the name in English language. I am using the Spotipy API as I am trying to retrieve Spotify artists. What I'm currently doing in PyCharm:

while True:
    response = input()
    searchQuery = TextBlob(response)
    who = []
    for item, tag in searchQuery.tags:
        if tag == "NNP":
            for nounPhrase in searchQuery.noun_phrases:
                np = TextBlob(nounPhrase)
                if item.lower() in np.words:
                    if nounPhrase not in who:
                        who.append(nounPhrase)

    print(who)
        if who:
            for name in who:
                if spotifyObject.search(name, 50, 0, 'artist', None):
                    searchResults = spotifyObject.search(name, 50, 0, 'artist', None)
                    artists = searchResults['artists']['items']
                    for a in artists:
                        print(a['name'])

Solution

  • Quick question:

    Why would you want 'Bill Smith' to appear under the same search for Will Smith? I believe they are 2 different artists.

    Option 1 If I understand your question correctly, I believe you may want to use regular expressions on the first name of the artist.

    For example name LIKE %(any fist name)% + smith

    As I assume the search is invalid in your case if the search returns "Will Sutton" for example.


    Option 2

    Do you want something similar to SpaCy's sense2vec feature. Which returns the word with percentage similarity. You could set a target that only returns results >70% for example. https://explosion.ai/demos/sense2vec

    If this is not useful, then explain your question again; in a bit more detail (such as what makes a valid search case)

    Thanks