Search code examples
pythonpandascountvectorizer

Iterate thorugh a particular column of pandas Dataframe/Series


I am implementing a research paper in which i have to categorize a cuisine based on its ingredients. Ingredients training data-set and test data-set is provided. Everything working fine. Model has been trained using best approach according to accuracy among SGD, RandomForest & Naive Bayes. I am using Random Forest because its accuracy is better than both of NB and SGD. Testing data-set is tested and prediction is working absolutely fine. Now i want to predict cuisine by manually inputting ( using input() of python ) ingredients. Here the problem comes when i try to search in series/Dataframe of pandas named here Y = train_data['all_ingredients'] OR Y = train_data['ingredients'].


def check_ing(ing):
    if ing in train_data['all_ingredients'].values:
        return True
    return False


no_of_ingredients = input("Total Number Of Ingredients: ")
no_of_ingredients = int(no_of_ingredients)
ingredient = []
for i in range(no_of_ingredients):
    ing = input("Enter Ingredient " + str(i) + " : ")
    if check_ing(ing) is True:
        ingredient.append(ing)

print(ingredient)

The problem is in the if statement of function check_ing(ing). How to improve that to search ingredient entered by user that if it is valid or not.

Result of Y.head() is: enter image description here


Solution

  • I think this answers your question, if the input is not in the column ingredients, it will be invalid, you might have to alter the first part of the if

    EDIT: didn't test it, this should work. EDIT 2: messed up copying and pasting.

    all_ing = [item for sublist in train_data["Ingredients"] for item in sublist]
    
    def check_ing(ing):
        if ing in all_ing:
                return True
        else:
            print("invalid ingredient")
            return False 
    
    no_of_ingredients = input("Total Number Of Ingredients: ")
    no_of_ingredients = int(no_of_ingredients)
    ingredient = []
    
    for i in range(no_of_ingredients):
        ing = input("Enter Ingredient " + str(i) + " : ")
        tf = check_ing(ing)
        if tf is True:
            ingredient.append(ing)
    
    print(ingredient)