import yfinance as yf
import pandas as pd
data = yf.download('SPY', start='2017-12-31', end='2019-12-31')
df = data[['Adj Close']]
df.resample('2Q',closed='left').mean()
The output from resample shows quarter-end dates 30-06-2018, 31-12-2018, 30-06-2019 and 31-12-2019. But some of those dates are not within original data. The average is calculated correctly.
These should be the correct dates 29-06-2018, 31-12-2018, 28-06-2019, 30-12-2019.
How can I use resample but output the correct dates that are in the original data?
f = {'Date': 'last', 'Adj Close': 'mean'}
df.reset_index().groupby([[d.year for d in df.index],[d.month==6 for d in
df.index]],as_index=False).agg(f).sort_values('Date')
I used groupby, which outputs the correct dates (i.e. the last date of the quarter that is within the original data) but the average is calculated wrongly. Why is that?
a) df.resample('2Q', closed='left').mean()
gives
Adj Close
Date
2018-06-30 260.626654
2018-12-31 268.594670
2019-06-30 273.702913
2019-12-31 297.780020
which is equivalent to
pd.DataFrame({'Date': ['2018-06-30', '2018-12-31', '2019-06-30', '2019-12-31'],
'Adj Close': [df[(df.index <= '2018-06-29')].mean().values[0],
df[(df.index > '2018-06-29') & (df.index < '2018-12-31')].mean().values[0],
df[(df.index >= '2018-12-31') & (df.index <= '2019-06-28')].mean().values[0],
df[(df.index > '2019-06-28')].mean().values[0]]})
Date Adj Close
0 2018-06-30 260.626654
1 2018-12-31 268.594670
2 2019-06-30 273.702913
3 2019-12-31 297.780020
where the 31st December 2018 has been assigned to Q1 2019.
b) df.resample('2Q', kind='period').mean()
gives
Adj Close
Date
2018Q1 260.626654
2018Q3 268.398217
2019Q1 273.943730
2019Q3 297.780020
which is equivalent to
pd.DataFrame({'Date': ['2018Q1', '2018Q3', '2019Q1', '2019Q3'],
'Adj Close': [df[(df.index <= '2018-06-29')].mean().values[0],
df[(df.index > '2018-06-29') & (df.index <= '2018-12-31')].mean().values[0],
df[(df.index > '2018-12-31') & (df.index <= '2019-06-28')].mean().values[0],
df[(df.index > '2019-06-28')].mean().values[0]]})
Date Adj Close
0 2018Q1 260.626654
1 2018Q3 268.398217
2 2019Q1 273.943730
3 2019Q3 297.780020
as well as to
f = {'Date': 'last', 'Adj Close': 'mean'}
df.reset_index().groupby([[d.year for d in df.index], [d.quarter <= 2 for d in df.index]], as_index=False).agg(f).sort_values('Date')
Date Adj Close
1 2018-06-29 260.626654
0 2018-12-31 268.398217
3 2019-06-28 273.943730
2 2019-12-30 297.780020
where the 31st December 2018 has been assigned to Q4 2018.