The data has 2 columns as title
and genre
. So I am trying to give the title
value of the row which matched by genre with user input.
Here what i try:
#CSV READ & GENRE-TITLE
data = pd.read_csv("data.csv")
df_title = data["title"]
df_genre = data["genre"]
#TOKENIZE
tokenized_genre = [word_tokenize(i) for i in df_genre]
tokenized_title = [word_tokenize(i) for i in df_title]
#INPUT-DATA MATCH
search = {e.lower() for l in tokenized_genre for e in l}
choice = input('Please enter a word = ')
while choice != "exit":
if choice.lower() in search:
print(data.loc[data.genre == {choice}, 'title'])
else:
print("The movie of the genre doesn't exist")
choice = input("Please enter a word = ")
But the result is: Series([], Name: title, dtype: object)
How can i solve it ?
Edit: Data samples for title
0 The Story of the Kelly Gang
1 Den sorte drøm
2 Cleopatra
3 L'Inferno
4 From the Manger to the Cross; or, Jesus of
...
And for genres:
0 Biography, Crime, Drama
1 Drama
2 Drama, History
3 Adventure, Drama, Fantasy
4 Biography, Drama
...
I would suggest something like this (please adapt to your situation upon your wishes, it's only some general guidelines and hints from where you can start):
import pandas as pd
# Warning: there are coma and semi-column in some of the films titles,
# so I had to use an other separator when exporting data to CSV,
# here I decided to chose the vertical bar '|' as you can see)
#CSV READ & GENRE-TITLE
data = pd.read_csv("data.csv", sep="|")
choice = input('Please enter a word = ')
while choice != "exit":
choice = choice.lower()
for index, row in data.iterrows():
if choice in row['genre'].lower():
print(row['title'])
else:
print(("The movie of the genre {} doesn't exist").format(choice))
choice = input("Please enter a word = ")
To generate a random number:
from random import randint
i = randint(0, len(data))
Then, use i
as the index to search within your DataFrame.
I let you play around with this.
Does Python have a string 'contains' substring method?
How to iterate over rows in a DataFrame in Pandas?