Search code examples
pythonfinance

function to calculate relative difference in revenues between 2 years


I am struggling to write a function that would calculate the relative difference in total revenues between 2 chosen years (2019, 2020).

Here is my data:

order_id | price | quantity| date|
1            30       2      20-03-2022
2            50       4      21-04-2020
3            45       3      20-03-2020
4            200      4      21-04-2019
5            54       2      04-03-2020
6            200      4      15-04-2019

This is what I have done, but the task is to write a function that will do that. (e.g. def dif():"

df['year'] = pd.DatetimeIndex(df['date']).year 
df['revenue']=df.price*order.quantity   #calculates revenue for each line of df
total_revenue = df.groupby(['year'])['revenue'].sum() #this gives me revenue by each year
rev2019 = total_revenue.loc[total_revenue["year"] = 2019 ]
rev2020 = total_revenue.loc[total_revenue["year"] = 2020 ]
dif=(rev2019/rev2020)-1

I want to have it all under one function. Would also be helpful if this function can be reused if I want to change the "year" to be a different number.


Solution

  • You can use numpy.where with multiple conditions and then .sum() like this:

    np.where((df.date >= start) & (df.date <= end), df.price*df.quantity, 0).sum()
    

    As a function:

    def total_revenue(df, start, end):
        return np.where((df.date >= start) & (df.date <= end), df.price*df.quantity, 0).sum()
    
    df = pd.DataFrame({'order_id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6}, 'price': {0: 30, 1: 50, 2: 45, 3: 200, 4: 54, 5: 200}, 'quantity': {0: 2, 1: 4, 2: 3, 3: 4, 4: 2, 5: 4}, 'date': {0: '20-03-2022', 1: '21-04-2020', 2: '20-03-2020', 3: '21-04-2019', 4: '04-03-2020', 5: '15-04-2019'}})
    df.date = pd.to_datetime(df.date, format="%d-%m-%Y")
    
    total_revenue(df, "2019", "2020")  # 1600
    

    Instead of "2019" as a string, you can also use datetime objects to get the total revenue of a certain time period.

    from datetime import datetime
    total_revenue(df, datetime(2019, 4, 20), datetime(2020, 5, 1))  # 1243
    

    EDIT:

    def dif(df: pd.DataFrame, start: int, end: int) -> float:
        return df[(df.date.dt.year >= start) & (df.date.dt.year <= end)] \  
                .assign(revenue=df.price*df.quantity) \
                .groupby(df.date.dt.year) \
                .revenue.sum().pct_change().loc[end]
    
    dif(df, 2019, 2020)  # -0.7231
    dif(df, 2020, 2022)  # -0.8646