Search code examples
pythondataframelambda

Is there a way to assign a list of values to each row of a Dataframe?


I am trying to add a new column to an existing dataframe. This new column will be a list of integer values that each row will hold.

I am looping through 2 dataframes to extract some records of the 2nd dataframe and store them in a list. Next, I am trying to assign this list to each row of the 1st dataframe using its index value. But the following throws an error.

1st dataframe - fitbit_dt2
2nd dataframe - recp_some

First Trial

for i in fitbit_dt2.index:
    allwd_rcp = []
    for j in recp_some.index:
        if fitbit_dt2["Food Intolerances"][i] not in recp_some["ingredients"][j]:
            allwd_rcp.append(recp_some["id"][j])
    fitbit_dt2["allwd_recipes"][i] = fitbit_dt2["allwd_recipes"][i].apply(allwd_rcp)```  

I even tried this - `fitbit_dt2["allwd_recipes"] = [allwd_rcp for i in fitbit_dt2.index]`  
But When I do this, the same list gets assigned to all rows.  

Required output of fitbit_dt2  
allwd_recipes
0   [10259, 25693, 20130, 22213, 13162, 6602]
1   [39267, 11913, 20591]
2   [36341, 29369, 27564, 18515, 3335, 4499, 4906]
3   [11913, 20591, 3335, 4499]
.
.
.

Please help me correct my approach.

fitbit_dt2
id  gender   Food Intolerances
0   male     Peanuts
1   female   None
2   male     None
3   female   Kiwi
4   female   Milk
5   male     Wheat
6   male     None

recp_some
id        ingredients
10354     ["romaine lettuce","black olives","grape 
          tomatoes","garlic","pepper","purple 
          onion","seasoning","garbanzo beans","feta cheese crumbles"]
2645      ["water","vegetable oil","wheat","salt"]
12734     ["italian seasoning","broiler-fryer 
          chicken","mayonaise","zesty italian dressing"]
2941      ["sugar","hot chili","fish sauce","lime juice","peanuts"]
40254     ["bananas","pineapple juice","cherries","coconut 
          rum","cranberry juice","full fat milk","rum","kiwi"]

Required Output of fitbit_dt2
id  gender   Food Intolerances    allwd_recipes
0   male     Peanuts              [10354,2645,12734,40254]
1   female   None                 [10354,2645,12734,2941,40254]     
2   male     None                 [10354,2645,12734,2941,40254] 
3   female   Kiwi                 [10354,2645,12734,2941]
4   female   Milk                 [2645,12734,2941]
5   male     Wheat                [10354,12734,2941,40254]
6   male     None                 [10354,2645,12734,2941,40254]

Solution

  • Your fitbit_dt2 dataframe has capital keys (e.g. Peanuts), but recp_some has lower cases (e.g. peanuts). So we have to make it lower to compare:

    fitbit_dt2['Food Intolerances'] = fitbit_dt2['Food Intolerances'].str.lower()
    

    Another point is that:

    • if fitbit_dt2 and recp_some data frames have the same words (e.g. in fitbit_dt2 has "milk", but, recp_some has "full fat milk" and "feta cheese crumbles". recp_some df should contain "milk" directly), it would be easier to solve.

    Function:

    def get_allowed_recipes(row):
        if row['Food Intolerances'] == 'None':
            return recp_some['id'].tolist()
        
        intolerances = row['Food Intolerances'].split(', ')
        allowed_recipes = []
        
        for _, recipe in recp_some.iterrows():
            if all(ingredient not in intolerances for ingredient in recipe['ingredients']):
                allowed_recipes.append(recipe['id'])
        
        return allowed_recipes
    

    Code:

    import pandas as pd
    
    data_fitbit_dt2 = {
        'id': [0, 1, 2, 3, 4, 5, 6],
        'gender': ['male', 'female', 'male', 'female', 'female', 'male', 'male'],
        'Food Intolerances': ['Peanuts', 'None', 'None', 'Kiwi', 'Milk', 'Wheat', 'None']
    }
    
    data_recp_some = {
        'id': [10354, 2645, 12734, 2941, 40254],
        'ingredients': [
            ["romaine lettuce", "black olives", "grape tomatoes", "garlic", "pepper", "purple onion", "seasoning", "garbanzo beans", "feta cheese crumbles"],
            ["water", "vegetable oil", "wheat", "salt"],
            ["italian seasoning", "broiler-fryer chicken", "mayonaise", "zesty italian dressing"],
            ["sugar", "hot chili", "fish sauce", "lime juice", "peanuts"],
            ["bananas", "pineapple juice", "cherries", "coconut rum", "cranberry juice", "milk", "rum", "kiwi"]
        ]
    }
    
    fitbit_dt2 = pd.DataFrame(data_fitbit_dt2)
    recp_some = pd.DataFrame(data_recp_some)
    
    fitbit_dt2['Food Intolerances'] = fitbit_dt2['Food Intolerances'].str.lower()
    
    def get_allowed_recipes(row):
        if row['Food Intolerances'] == 'None':
            return recp_some['id'].tolist()
        
        intolerances = row['Food Intolerances'].split(', ')
        allowed_recipes = []
        
        for _, recipe in recp_some.iterrows():
            if all(ingredient not in intolerances for ingredient in recipe['ingredients']):
                allowed_recipes.append(recipe['id'])
        
        return allowed_recipes
    
    # to create the 'allwd_recipes' column
    fitbit_dt2['allwd_recipes'] = fitbit_dt2.apply(get_allowed_recipes, axis=1)
    
    print(fitbit_dt2)
    

    Output:

    enter image description here