Search code examples
pythonpandasdataframerecommendation-engine

How can i check if the keyword is similar to the movie title in my recommendation system?


I have tried every ways i knew to check if the keyword is similar to something in my dataset.csv movie title but nothing is working. It only recommends me movies if the title is exactly similar to that in dataset. for example: if i searched for Spider-Man 3 then it will recommend me movies related to it but if i searched spider man 3 then it would not know what i meant and show the error.

import pandas as pd
import openpyxl
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

df = pd.read_csv('dataset.csv')
df.head(3)
df['Movie_id'] = range(0,1000)
#print(df.head(10))
#print(df.shape)
columns = ['Actors', 'Director', 'Genre', 'Title']
#print(df[columns].head(3))
#print(df[columns].isnull().values.any())

def important(data):
    features = []
    for i in range(0, data.shape[0]):
        features.append(data['Actors'][i]+' '+data['Director'][i]+' '+data['Genre'][i]+' '+data['Title'][i])
    return features

df['features'] = important(df)

#print(df.head(3))

cm = CountVectorizer().fit_transform(df['features'])
cs = cosine_similarity(cm)
print(cs)
print(cs.shape)
title = "Spider-Man 3"#works
#title = "spider man" doesnt works
movie_id = df[df.Title == title]['Movie_id'].values[0]

scores = list(enumerate(cs[movie_id]))
sorted_Scores = sorted(scores, key = lambda x:x[1], reverse=True)
sorted_Scores = sorted_Scores[1:]
print(sorted_Scores)

a = 0
print("The 10 most recommended movie to", title, 'are:')
for item in sorted_Scores:
    movie_title = df[df.Movie_id == item[0]]['Title'].values[0]
    print(a+1, movie_title)
    a += 1
    if a > 9:
        break

so how can i make it keyword based to this code.


Solution

  • You can use the fuzzywuzzy library.

    from fuzzywuzzy import process, fuzz
    titles = df['Title'].unique().tolist()
    fuzzy_matches = process.extract('Spider-Man 3', titles, scorer=fuzz.token_set_ratio)
    

    After this, fuzzy_matches should contain tuples with similar words and a value of how similar they are. You can then grab the most fitting title and search for it. Like this:

    best_fitting_title = fuzzy_matches[0][0]
    movie_id = df[df.Title == best_fitting_title]['Movie_id'].values[0]
    

    I didn't test it completely because I don't have sample data, but it should work.