I am implementing a research paper in which i have to categorize a cuisine based on its ingredients. Ingredients training data-set and test data-set is provided. Everything working fine. Model has been trained using best approach according to accuracy among SGD, RandomForest & Naive Bayes. I am using Random Forest because its accuracy is better than both of NB and SGD. Testing data-set is tested and prediction is working absolutely fine. Now i want to predict cuisine by manually inputting ( using input()
of python ) ingredients. Here the problem comes when i try to search in series/Dataframe of pandas named here Y = train_data['all_ingredients'] OR Y = train_data['ingredients']
.
def check_ing(ing):
if ing in train_data['all_ingredients'].values:
return True
return False
no_of_ingredients = input("Total Number Of Ingredients: ")
no_of_ingredients = int(no_of_ingredients)
ingredient = []
for i in range(no_of_ingredients):
ing = input("Enter Ingredient " + str(i) + " : ")
if check_ing(ing) is True:
ingredient.append(ing)
print(ingredient)
The problem is in the if statement of function check_ing(ing)
.
How to improve that to search ingredient entered by user that if it is valid or not.
I think this answers your question, if the input is not in the column ingredients, it will be invalid, you might have to alter the first part of the if
EDIT: didn't test it, this should work. EDIT 2: messed up copying and pasting.
all_ing = [item for sublist in train_data["Ingredients"] for item in sublist]
def check_ing(ing):
if ing in all_ing:
return True
else:
print("invalid ingredient")
return False
no_of_ingredients = input("Total Number Of Ingredients: ")
no_of_ingredients = int(no_of_ingredients)
ingredient = []
for i in range(no_of_ingredients):
ing = input("Enter Ingredient " + str(i) + " : ")
tf = check_ing(ing)
if tf is True:
ingredient.append(ing)
print(ingredient)