Im creating a Chatbot which uses questions from a CSV file and checks similarity using SKlearn and NLTK, However im getting an error if the same input is entered twice:
This is the main code that takes the user input and outputs an answer to the user:
import pandas as pd
data=pd.read_csv('FootballQA.csv')
question=data['Q'].tolist()
answer=data['A'].tolist()
lemmer = nltk.stem.WordNetLemmatizer()
#WordNet is a semantically-oriented dictionary of English included in NLTK.
def LemTokens(tokens):
return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
GREETING_INPUTS = ("hello", "hi", "greetings", "sup", "what's up","hey","how are you")
GREETING_RESPONSES = ["hi", "hey", "hi there", "hello", "I am glad! You are talking to me"]
def greeting(sentence):
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)
GI = ("how are you")
GR = ["i'm fine","good,how can i help you!"]
def greet(sentence):
for word in sentence.split():
if word.lower() in GREETING_INPUTS:
return random.choice(GREETING_RESPONSES)
def responses(user):
response=''
question.append(user)
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(question)
val = cosine_similarity(tfidf[-1], tfidf)
id1=val.argsort()[0][-2]
flat = val.flatten()
flat.sort()
req = flat[-2]
if(req==0):
robo_response=response+"I am sorry! I don't understand you"
return robo_response
else:
response = response+answer[id1]
question.remove(user)
return response
command=1
while(command):
v = input("Enter your value: ")
if(v=="exit"):
command=0
else:
print(responses(str(v)))
When the program runs it asks the user for their input however the problem happens if the same input is entered twice, if i enter "football" it will first correctly display the output i want but then a second time will stop the program and im given this error:
Enter your value: scored
Alan shearer holds the goal record in the premier league.
Enter your value: football
I am sorry! I don't understand you
Enter your value: football
Traceback (most recent call last):
File "C:\Users\Chris\Desktop\chatbot_simple\run.py", line 79, in <module>
print(responses(str(v)))
File "C:\Users\Chris\Desktop\chatbot_simple\run.py", line 68, in responses
response = response+answer[id1]
IndexError: list index out of range
The csv:
Q,A
Who has scored the most goals in the premier league?,Alan shearer holds the goal record in the premier league.
Who has the most appearences in the premier league?,Gareth Barry has the most appearences in premier league history.
I've tried deleting the variable after each input but it still somehow remembers it, anyone have any ideas ? Thanks Chris
answer=data['A'].tolist()
and then later on
id1=val.argsort()[0][-2]
response = response+answer[id1]
So if the anwser
don't have id1
in it you will get index out of range. So in your case the len(answer) >= id1
is true
.