I have tried every ways i knew to check if the keyword is similar to something in my dataset.csv movie title but nothing is working. It only recommends me movies if the title is exactly similar to that in dataset. for example: if i searched for Spider-Man 3 then it will recommend me movies related to it but if i searched spider man 3 then it would not know what i meant and show the error.
import pandas as pd
import openpyxl
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
df = pd.read_csv('dataset.csv')
df.head(3)
df['Movie_id'] = range(0,1000)
#print(df.head(10))
#print(df.shape)
columns = ['Actors', 'Director', 'Genre', 'Title']
#print(df[columns].head(3))
#print(df[columns].isnull().values.any())
def important(data):
features = []
for i in range(0, data.shape[0]):
features.append(data['Actors'][i]+' '+data['Director'][i]+' '+data['Genre'][i]+' '+data['Title'][i])
return features
df['features'] = important(df)
#print(df.head(3))
cm = CountVectorizer().fit_transform(df['features'])
cs = cosine_similarity(cm)
print(cs)
print(cs.shape)
title = "Spider-Man 3"#works
#title = "spider man" doesnt works
movie_id = df[df.Title == title]['Movie_id'].values[0]
scores = list(enumerate(cs[movie_id]))
sorted_Scores = sorted(scores, key = lambda x:x[1], reverse=True)
sorted_Scores = sorted_Scores[1:]
print(sorted_Scores)
a = 0
print("The 10 most recommended movie to", title, 'are:')
for item in sorted_Scores:
movie_title = df[df.Movie_id == item[0]]['Title'].values[0]
print(a+1, movie_title)
a += 1
if a > 9:
break
so how can i make it keyword based to this code.
You can use the fuzzywuzzy library.
from fuzzywuzzy import process, fuzz
titles = df['Title'].unique().tolist()
fuzzy_matches = process.extract('Spider-Man 3', titles, scorer=fuzz.token_set_ratio)
After this, fuzzy_matches
should contain tuples with similar words and a value of how similar they are. You can then grab the most fitting title and search for it.
Like this:
best_fitting_title = fuzzy_matches[0][0]
movie_id = df[df.Title == best_fitting_title]['Movie_id'].values[0]
I didn't test it completely because I don't have sample data, but it should work.