So I have been studying the SP500 yearly returns with information downloaded from my quandl subscription. I have used resample() and pct_change() to study the data but my results are not coming as to what is expected for some reason.
sp500_df = quandl.get("MULTPL/SP500_REAL_PRICE_MONTH", authtoken="YOUR OWN AUTH KEY")
sp500_Y_ret_df = sp500_df['Value'].resample('Y').mean().pct_change().dropna()
The expected value for the SP 500 return for year ending 2008 should be -38.5% but my code is showing -17% for some reason? If for some reason you cannot access the data I can provide a .csv file for the data. Thanks a million for the help.
sp500_Y_ret_df.loc['2008-12-31']
output:
-0.17319465450687388
last 20 years:
sp500_Y_ret_df.tail(20)
output:
2001-12-31 -0.164631
2002-12-31 -0.164795
2003-12-31 -0.032081
2004-12-31 0.173145
2005-12-31 0.067678
2006-12-31 0.085836
2007-12-31 0.126625
2008-12-31 -0.173195
2009-12-31 -0.224552
2010-12-31 0.203406
2011-12-31 0.113738
2012-12-31 0.087221
2013-12-31 0.190603
2014-12-31 0.175436
2015-12-31 0.067610
2016-12-31 0.014868
2017-12-31 0.170363
2018-12-31 0.121093
2019-12-31 0.065247
2020-12-31 0.061747
Freq: A-DEC, Name: Value, dtype: float64
USING Random made data:
aapl_df = pd.DataFrame({
'ticker':np.repeat( ['aapl'], 2500 ),
'date':pd.date_range('1/1/2011', periods=2500, freq='D'),
'price':(np.random.randn(2500).cumsum() + 10) }).set_index('date')
aapl_df.head()
date
2011-01-01 aapl 9.011290
2011-01-02 aapl 9.092603
2011-01-03 aapl 9.139830
2011-01-04 aapl 7.782112
2011-01-05 aapl 8.316270
using 'last' as stated yielded closer results but not sure if that is pure luck
aapl_Y_ret_df = aapl_df['price'].resample('Y').last()
aapl_Y_ret_df.tail()
output
date
2013-12-31 18.535328
2014-12-31 15.201832
2015-12-31 36.040411
2016-12-31 42.272464
2017-12-31 20.421079
Freq: A-DEC, Name: price, dtype: float64
--
aapl_Y_ret_df = aapl_df['price'].resample('Y').last().pct_change()
aapl_Y_ret_df.tail()
date
2013-12-31 0.569359
2014-12-31 -0.179846
2015-12-31 1.370794
2016-12-31 0.172918
2017-12-31 -0.516918
Freq: A-DEC, Name: price, dtype: float64
'Adj Close'
, which is what the OP wants.Close
or Adj Close
, and then sum
and multiply by 100.groupby
and DataFrameGroupBy.pct_change
to get the values by year.df['Adj Close'].resample('Y').mean()
returns the mean of the 'Adj Close'
values for each year, which is not how to determine the yearly return.
-17.4%
. This is not the return.python 3.11.2
, pandas 2.0.0
import yfinance as yf
import pandas as pd
# load S&P 500 data
df = yf.download('^gspc', start='2000-01-01', end='2020-01-01').reset_index()
# display(df)
Date High Low Open Close Volume Adj Close
0 2000-01-03 1478.000000 1438.359985 1469.250000 1455.219971 931800000 1455.219971
1 2000-01-04 1455.219971 1397.430054 1455.219971 1399.420044 1009000000 1399.420044
2 2000-01-05 1413.270020 1377.680054 1399.420044 1402.109985 1085500000 1402.109985
3 2000-01-06 1411.900024 1392.099976 1402.109985 1403.449951 1092300000 1403.449951
4 2000-01-07 1441.469971 1400.729980 1403.449951 1441.469971 1225200000 1441.469971
# groupby year and determine the daily percent change by year, and add it as a column to df
df['pct_ch'] = df.groupby(df.Date.dt.year)['Adj Close'].pct_change()
# groupby year and aggregate sum of pct_ch to get the yearly return
yearly_pct_ch = df.groupby(df.Date.dt.year)['pct_ch'].sum().mul(100).reset_index().rename(columns={'pct_ch': 'cum_pct_ch_year'})
# display(yearly_pct_ch)
Date cum_pct_ch_year
0 2000 -7.274088
1 2001 -8.890805
2 2002 -23.811947
3 2003 21.552072
4 2004 9.535574
5 2005 4.295586
6 2006 11.626670
7 2007 4.860178
8 2008 -38.702107
9 2009 21.622674
10 2010 12.052038
11 2011 1.575069
12 2012 11.840560
13 2013 24.012739
14 2014 12.320664
15 2015 0.501799
16 2016 11.494988
17 2017 17.127082
18 2018 -5.822426
19 2019 26.031938