Search code examples
pythondateaveragepercentage

Python Percentage Change with group


I have a dataframe. And, I also have start dates for CODE. My dataframe contains prices from the start date to 08.06.2023.

For each code, I can calculate the returns for 30 - 60 - 90 days ahead of the code's own start date and find the average of those returns.

My start date is: 22.12.2022 (dd.mm.yyyy) for APPLE

My start date is: 16.11.2022 (dd.mm.yyyy) for HARLEY

My start date is: 21.10.2021 (dd.mm.yyyy) for META

DATE CODE CLOSE PRICE
08.06.2023 APPLE 8.5
07.06.2023 APPLE 7
06.06.2023 APPLE 9.3
05.06.2023 APPLE 12
04.06.2023 APPLE 3
...... APPLE ...
22.12.2022 APPLE 25
08.06.2023 HARLEY 4
07.06.2023 HARLEY 2
06.06.2023 HARLEY 1
05.06.2023 HARLEY 5
04.06.2023 HARLEY 9.5
... HARLEY ....
16.11.2022 HARLEY 4
08.06.2023 META 1.1
07.06.2023 META 2
06.06.2023 META 1
05.06.2023 META 15
04.06.2023 META 65
... META ...
21.10.2021 META 5

My expected output;

CODE START DATE AVERAGE RETURN FOR 30 DAY FROM START DATE
APPLE 22.12.2022 %2
HARLEY 16.11.2022 %4
META 21.10.2021 %-5
CODE START DATE AVERAGE RETURN FOR 60 DAY FROM START DATE
APPLE 22.12.2022 %6
HARLEY 16.11.2022 %3
META 21.10.2021 %-2
CODE START DATE AVERAGE RETURN FOR 90 DAY FROM START DATE
APPLE 22.12.2022 %-1
HARLEY 16.11.2022 %7
META 21.10.2021 %10

Solution

  • Even if your question is posed in the wrong manner, since you cannot expect that someone else do your homeworks from scratch in your place...
    Here is my solution, from which you can take inspiration.

    import pandas as pd
    import numpy as np
    from datetime import datetime, timedelta
    
    data = {'DATE': ['08.06.2023', '07.06.2023', '06.06.2023', '05.06.2023', '04.06.2023', '22.12.2022', '08.06.2023', '07.06.2023', '06.06.2023', '05.06.2023', '04.06.2023', '16.11.2022', '08.06.2023', '07.06.2023','06.06.2023', '05.06.2023','04.06.2023', '21.10.2021'],
            'CODE': ['APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE', 'HARLEY', 'HARLEY', 'HARLEY', 'HARLEY', 'HARLEY', 'HARLEY', 'META', 'META', 'META', 'META', 'META', 'META'],
            'CLOSE PRICE': [8.5, 7, 9.3, 12, 3, 25, 4, 2, 1, 5, 9.5, 4, 1.1, 2, 1, 15, 65, 5]}
    
    df = pd.DataFrame(data)
    
    df['DATE'] = pd.to_datetime(df['DATE'], format='%d.%m.%Y')
    
    start_dates = {'APPLE': datetime(2022, 12, 22),
                   'HARLEY': datetime(2022, 11, 16),
                   'META': datetime(2021, 10, 21)}
    results = {}
    for code in start_dates.keys():
        start_date = start_dates[code]
        portion_df = df.loc[df['CODE'] == code]
        if not portion_df.empty:
            prices = portion_df.loc[portion_df['DATE'] >= start_date, 'CLOSE PRICE'].values
            if len(prices) > 0:
                thirty_day_prices = prices[0:min(len(prices),30)]
                sixty_day_prices = prices[0:min(len(prices),60)]
                ninety_day_prices = prices[0:min(len(prices),90)]
                thirty_day_tot = (thirty_day_prices[-1] / thirty_day_prices[0]) - 1 if len(thirty_day_prices) > 1 else np.nan
                sixty_day_tot = (sixty_day_prices[-1] / sixty_day_prices[0]) - 1 if len(sixty_day_prices) > 1 else np.nan
                ninety_day_tot = (ninety_day_prices[-1] / ninety_day_prices[0]) - 1 if len(ninety_day_prices) > 1 else np.nan
                avg_thirty_day_return = np.nanmean(thirty_day_tot)
                avg_sixty_day_return = np.nanmean(sixty_day_tot)
                avg_ninety_day_return = np.nanmean(ninety_day_tot)
                results[code] = {'START DATE': start_date.strftime('%d.%m.%Y'),
                                 'AVERAGE GAIN FOR 30 DAY FROM START DATE': f'{avg_thirty_day_return:.0%}' if not np.isnan(avg_thirty_day_return) else 'No data',
                                 'AVERAGE GAIN FOR 60 DAY FROM START DATE': f'{avg_sixty_day_return:.0%}' if not np.isnan(avg_sixty_day_return) else 'No data',
                                 'AVERAGE GAIN FOR 90 DAY FROM START DATE': f'{avg_ninety_day_return:.0%}' if not np.isnan(avg_ninety_day_return) else 'No data'}
            else:
                print(f"No data available for {code} with starting date {start_date.strftime('%d.%m.%Y')}")
        else:
            print(f"No data available for {code}")
    
    wanted_df = pd.DataFrame.from_dict(results, orient='index')
    wanted_df.index.name = 'CODE'
    

    Hoping that it is what you wanted to achieve...Bye!