I have a dataframe. And, I also have start dates for CODE. My dataframe contains prices from the start date to 08.06.2023.
For each code, I can calculate the returns for 30 - 60 - 90 days ahead of the code's own start date and find the average of those returns.
My start date is: 22.12.2022 (dd.mm.yyyy) for APPLE
My start date is: 16.11.2022 (dd.mm.yyyy) for HARLEY
My start date is: 21.10.2021 (dd.mm.yyyy) for META
DATE | CODE | CLOSE PRICE |
---|---|---|
08.06.2023 | APPLE | 8.5 |
07.06.2023 | APPLE | 7 |
06.06.2023 | APPLE | 9.3 |
05.06.2023 | APPLE | 12 |
04.06.2023 | APPLE | 3 |
...... | APPLE | ... |
22.12.2022 | APPLE | 25 |
08.06.2023 | HARLEY | 4 |
07.06.2023 | HARLEY | 2 |
06.06.2023 | HARLEY | 1 |
05.06.2023 | HARLEY | 5 |
04.06.2023 | HARLEY | 9.5 |
... | HARLEY | .... |
16.11.2022 | HARLEY | 4 |
08.06.2023 | META | 1.1 |
07.06.2023 | META | 2 |
06.06.2023 | META | 1 |
05.06.2023 | META | 15 |
04.06.2023 | META | 65 |
... | META | ... |
21.10.2021 | META | 5 |
My expected output;
CODE | START DATE | AVERAGE RETURN FOR 30 DAY FROM START DATE |
---|---|---|
APPLE | 22.12.2022 | %2 |
HARLEY | 16.11.2022 | %4 |
META | 21.10.2021 | %-5 |
CODE | START DATE | AVERAGE RETURN FOR 60 DAY FROM START DATE |
---|---|---|
APPLE | 22.12.2022 | %6 |
HARLEY | 16.11.2022 | %3 |
META | 21.10.2021 | %-2 |
CODE | START DATE | AVERAGE RETURN FOR 90 DAY FROM START DATE |
---|---|---|
APPLE | 22.12.2022 | %-1 |
HARLEY | 16.11.2022 | %7 |
META | 21.10.2021 | %10 |
Even if your question is posed in the wrong manner, since you cannot expect that someone else do your homeworks from scratch in your place...
Here is my solution, from which you can take inspiration.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
data = {'DATE': ['08.06.2023', '07.06.2023', '06.06.2023', '05.06.2023', '04.06.2023', '22.12.2022', '08.06.2023', '07.06.2023', '06.06.2023', '05.06.2023', '04.06.2023', '16.11.2022', '08.06.2023', '07.06.2023','06.06.2023', '05.06.2023','04.06.2023', '21.10.2021'],
'CODE': ['APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE', 'APPLE', 'HARLEY', 'HARLEY', 'HARLEY', 'HARLEY', 'HARLEY', 'HARLEY', 'META', 'META', 'META', 'META', 'META', 'META'],
'CLOSE PRICE': [8.5, 7, 9.3, 12, 3, 25, 4, 2, 1, 5, 9.5, 4, 1.1, 2, 1, 15, 65, 5]}
df = pd.DataFrame(data)
df['DATE'] = pd.to_datetime(df['DATE'], format='%d.%m.%Y')
start_dates = {'APPLE': datetime(2022, 12, 22),
'HARLEY': datetime(2022, 11, 16),
'META': datetime(2021, 10, 21)}
results = {}
for code in start_dates.keys():
start_date = start_dates[code]
portion_df = df.loc[df['CODE'] == code]
if not portion_df.empty:
prices = portion_df.loc[portion_df['DATE'] >= start_date, 'CLOSE PRICE'].values
if len(prices) > 0:
thirty_day_prices = prices[0:min(len(prices),30)]
sixty_day_prices = prices[0:min(len(prices),60)]
ninety_day_prices = prices[0:min(len(prices),90)]
thirty_day_tot = (thirty_day_prices[-1] / thirty_day_prices[0]) - 1 if len(thirty_day_prices) > 1 else np.nan
sixty_day_tot = (sixty_day_prices[-1] / sixty_day_prices[0]) - 1 if len(sixty_day_prices) > 1 else np.nan
ninety_day_tot = (ninety_day_prices[-1] / ninety_day_prices[0]) - 1 if len(ninety_day_prices) > 1 else np.nan
avg_thirty_day_return = np.nanmean(thirty_day_tot)
avg_sixty_day_return = np.nanmean(sixty_day_tot)
avg_ninety_day_return = np.nanmean(ninety_day_tot)
results[code] = {'START DATE': start_date.strftime('%d.%m.%Y'),
'AVERAGE GAIN FOR 30 DAY FROM START DATE': f'{avg_thirty_day_return:.0%}' if not np.isnan(avg_thirty_day_return) else 'No data',
'AVERAGE GAIN FOR 60 DAY FROM START DATE': f'{avg_sixty_day_return:.0%}' if not np.isnan(avg_sixty_day_return) else 'No data',
'AVERAGE GAIN FOR 90 DAY FROM START DATE': f'{avg_ninety_day_return:.0%}' if not np.isnan(avg_ninety_day_return) else 'No data'}
else:
print(f"No data available for {code} with starting date {start_date.strftime('%d.%m.%Y')}")
else:
print(f"No data available for {code}")
wanted_df = pd.DataFrame.from_dict(results, orient='index')
wanted_df.index.name = 'CODE'
Hoping that it is what you wanted to achieve...Bye!