Search code examples
python-3.xsklearn-pandas

Create new column in pandas dataframe based on if/elif/and functions


I have searched for my exact issue to no avail. These two threads Creating a new column based on if-elif-else condition and create new pandas dataframe column based on if-else condition with a lookup guided my code though my code fails to execute.

Issues: I have a dataframe which I have example-reproduced below. Region attribute has two values only - a or b (or could have more), same for year, though region a could have both years etc. What I want to do is create a new column, "dollars", and look up the value for region, and if it is region "a" AND year is e.g 2006, takes sales in that row, and multiply with rate for that year and append value in new column - dollars. I am beginner and below is what I have so far - via function - and obviously executing the .apply function returns a ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0'). I am particularly interested in a more efficient implementation as the dataframe is rather large and would love to optimize computational efficiency.

import pandas as np

rate_2006, rate_2007 = 100, 200


c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}

df1 = pd.DataFrame(c)
df1

def new_col(row): 
    if df1["region"] == "a" and df1["year"] == 2006:
        nc = row["sales"] * rate_2006
    elif df1["region"] == "a" and df1["year"] == 2007:
        nc = row["sales"] * rate_2007
    elif df1["region"] == "b" and df1["year"] == 2006:
        nc = row["sales"] * rate_2006
    else:
        nc = row["sales"] * rate_2007
    return nc

df1["Dollars"] = df1.apply(new_col, axis=1)
df1

Solution

  • The issue maybe due to the way you are using it. I don't know if it will help you. but I have re written the code as per my knowledge that is working.

    import pandas as pd
    
    rate_2006, rate_2007 = 100, 200
    
    
    c = {
    'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
    'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
    'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
    }
    
    df1 = pd.DataFrame(c)
    print(df1)
    
    def new_col(value): 
        if df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2006:
            df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
        elif df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2007:
            df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
        elif df1.loc[value,"region"] == "b" and df1.loc[value,"year"] == 2006:
            df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
        else:
            df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
    
    for value in range(len(df1)):
        new_col(value)