I am trying to add a new column to an existing dataframe. This new column will be a list of integer values that each row will hold.
I am looping through 2 dataframes to extract some records of the 2nd dataframe and store them in a list. Next, I am trying to assign this list to each row of the 1st dataframe using its index value. But the following throws an error.
1st dataframe - fitbit_dt2
2nd dataframe - recp_some
First Trial
for i in fitbit_dt2.index:
allwd_rcp = []
for j in recp_some.index:
if fitbit_dt2["Food Intolerances"][i] not in recp_some["ingredients"][j]:
allwd_rcp.append(recp_some["id"][j])
fitbit_dt2["allwd_recipes"][i] = fitbit_dt2["allwd_recipes"][i].apply(allwd_rcp)```
I even tried this - `fitbit_dt2["allwd_recipes"] = [allwd_rcp for i in fitbit_dt2.index]`
But When I do this, the same list gets assigned to all rows.
Required output of fitbit_dt2
allwd_recipes
0 [10259, 25693, 20130, 22213, 13162, 6602]
1 [39267, 11913, 20591]
2 [36341, 29369, 27564, 18515, 3335, 4499, 4906]
3 [11913, 20591, 3335, 4499]
.
.
.
Please help me correct my approach.
fitbit_dt2
id gender Food Intolerances
0 male Peanuts
1 female None
2 male None
3 female Kiwi
4 female Milk
5 male Wheat
6 male None
recp_some
id ingredients
10354 ["romaine lettuce","black olives","grape
tomatoes","garlic","pepper","purple
onion","seasoning","garbanzo beans","feta cheese crumbles"]
2645 ["water","vegetable oil","wheat","salt"]
12734 ["italian seasoning","broiler-fryer
chicken","mayonaise","zesty italian dressing"]
2941 ["sugar","hot chili","fish sauce","lime juice","peanuts"]
40254 ["bananas","pineapple juice","cherries","coconut
rum","cranberry juice","full fat milk","rum","kiwi"]
Required Output of fitbit_dt2
id gender Food Intolerances allwd_recipes
0 male Peanuts [10354,2645,12734,40254]
1 female None [10354,2645,12734,2941,40254]
2 male None [10354,2645,12734,2941,40254]
3 female Kiwi [10354,2645,12734,2941]
4 female Milk [2645,12734,2941]
5 male Wheat [10354,12734,2941,40254]
6 male None [10354,2645,12734,2941,40254]
Your fitbit_dt2 dataframe has capital keys (e.g. Peanuts), but recp_some has lower cases (e.g. peanuts). So we have to make it lower to compare:
fitbit_dt2['Food Intolerances'] = fitbit_dt2['Food Intolerances'].str.lower()
Another point is that:
Function:
def get_allowed_recipes(row):
if row['Food Intolerances'] == 'None':
return recp_some['id'].tolist()
intolerances = row['Food Intolerances'].split(', ')
allowed_recipes = []
for _, recipe in recp_some.iterrows():
if all(ingredient not in intolerances for ingredient in recipe['ingredients']):
allowed_recipes.append(recipe['id'])
return allowed_recipes
Code:
import pandas as pd
data_fitbit_dt2 = {
'id': [0, 1, 2, 3, 4, 5, 6],
'gender': ['male', 'female', 'male', 'female', 'female', 'male', 'male'],
'Food Intolerances': ['Peanuts', 'None', 'None', 'Kiwi', 'Milk', 'Wheat', 'None']
}
data_recp_some = {
'id': [10354, 2645, 12734, 2941, 40254],
'ingredients': [
["romaine lettuce", "black olives", "grape tomatoes", "garlic", "pepper", "purple onion", "seasoning", "garbanzo beans", "feta cheese crumbles"],
["water", "vegetable oil", "wheat", "salt"],
["italian seasoning", "broiler-fryer chicken", "mayonaise", "zesty italian dressing"],
["sugar", "hot chili", "fish sauce", "lime juice", "peanuts"],
["bananas", "pineapple juice", "cherries", "coconut rum", "cranberry juice", "milk", "rum", "kiwi"]
]
}
fitbit_dt2 = pd.DataFrame(data_fitbit_dt2)
recp_some = pd.DataFrame(data_recp_some)
fitbit_dt2['Food Intolerances'] = fitbit_dt2['Food Intolerances'].str.lower()
def get_allowed_recipes(row):
if row['Food Intolerances'] == 'None':
return recp_some['id'].tolist()
intolerances = row['Food Intolerances'].split(', ')
allowed_recipes = []
for _, recipe in recp_some.iterrows():
if all(ingredient not in intolerances for ingredient in recipe['ingredients']):
allowed_recipes.append(recipe['id'])
return allowed_recipes
# to create the 'allwd_recipes' column
fitbit_dt2['allwd_recipes'] = fitbit_dt2.apply(get_allowed_recipes, axis=1)
print(fitbit_dt2)
Output: