I have searched for my exact issue to no avail. These two threads Creating a new column based on if-elif-else condition and create new pandas dataframe column based on if-else condition with a lookup guided my code though my code fails to execute.
Issues: I have a dataframe which I have example-reproduced below. Region attribute has two values only - a or b (or could have more), same for year, though region a could have both years etc. What I want to do is create a new column, "dollars", and look up the value for region, and if it is region "a" AND year is e.g 2006, takes sales in that row, and multiply with rate for that year and append value in new column - dollars. I am beginner and below is what I have so far - via function - and obviously executing the .apply function returns a ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0'). I am particularly interested in a more efficient implementation as the dataframe is rather large and would love to optimize computational efficiency.
import pandas as np
rate_2006, rate_2007 = 100, 200
c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}
df1 = pd.DataFrame(c)
df1
def new_col(row):
if df1["region"] == "a" and df1["year"] == 2006:
nc = row["sales"] * rate_2006
elif df1["region"] == "a" and df1["year"] == 2007:
nc = row["sales"] * rate_2007
elif df1["region"] == "b" and df1["year"] == 2006:
nc = row["sales"] * rate_2006
else:
nc = row["sales"] * rate_2007
return nc
df1["Dollars"] = df1.apply(new_col, axis=1)
df1
The issue maybe due to the way you are using it. I don't know if it will help you. but I have re written the code as per my knowledge that is working.
import pandas as pd
rate_2006, rate_2007 = 100, 200
c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}
df1 = pd.DataFrame(c)
print(df1)
def new_col(value):
if df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
elif df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2007:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
elif df1.loc[value,"region"] == "b" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
else:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
for value in range(len(df1)):
new_col(value)