I am struggling to write a function that would calculate the relative difference in total revenues between 2 chosen years (2019, 2020).
Here is my data:
order_id | price | quantity| date|
1 30 2 20-03-2022
2 50 4 21-04-2020
3 45 3 20-03-2020
4 200 4 21-04-2019
5 54 2 04-03-2020
6 200 4 15-04-2019
This is what I have done, but the task is to write a function that will do that. (e.g. def dif():"
df['year'] = pd.DatetimeIndex(df['date']).year
df['revenue']=df.price*order.quantity #calculates revenue for each line of df
total_revenue = df.groupby(['year'])['revenue'].sum() #this gives me revenue by each year
rev2019 = total_revenue.loc[total_revenue["year"] = 2019 ]
rev2020 = total_revenue.loc[total_revenue["year"] = 2020 ]
dif=(rev2019/rev2020)-1
I want to have it all under one function. Would also be helpful if this function can be reused if I want to change the "year" to be a different number.
You can use numpy.where
with multiple conditions and then .sum()
like this:
np.where((df.date >= start) & (df.date <= end), df.price*df.quantity, 0).sum()
As a function:
def total_revenue(df, start, end):
return np.where((df.date >= start) & (df.date <= end), df.price*df.quantity, 0).sum()
df = pd.DataFrame({'order_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6}, 'price': {0: 30, 1: 50, 2: 45, 3: 200, 4: 54, 5: 200}, 'quantity': {0: 2, 1: 4, 2: 3, 3: 4, 4: 2, 5: 4}, 'date': {0: '20-03-2022', 1: '21-04-2020', 2: '20-03-2020', 3: '21-04-2019', 4: '04-03-2020', 5: '15-04-2019'}})
df.date = pd.to_datetime(df.date, format="%d-%m-%Y")
total_revenue(df, "2019", "2020") # 1600
Instead of "2019" as a string, you can also use datetime objects to get the total revenue of a certain time period.
from datetime import datetime
total_revenue(df, datetime(2019, 4, 20), datetime(2020, 5, 1)) # 1243
EDIT:
def dif(df: pd.DataFrame, start: int, end: int) -> float:
return df[(df.date.dt.year >= start) & (df.date.dt.year <= end)] \
.assign(revenue=df.price*df.quantity) \
.groupby(df.date.dt.year) \
.revenue.sum().pct_change().loc[end]
dif(df, 2019, 2020) # -0.7231
dif(df, 2020, 2022) # -0.8646