Search code examples
pythonpandaspandas-groupbypandas-datareader

Reindexing pandas_datareader df into quarterly giving random NaN's


I'm trying to resample pandas_datareader price data dfs into quarterly (from either monthly or daily), and daily is giving me sporadic NaN's (across all instruments), and monthly is giving me ALL NaN's. End goal is a dataframe with just quarterly return. Tried it with yfinance and pdr, both throw nans. Even YF on 3mo interval gives me every other NaN -- what am I missing here?

#sp = yf.download("SPY", start='2017-01-01', end="2020-01-30",interval='3mo').drop(columns=['Open','High','Low','Close','Volume'])
sp = pdr.DataReader('SPY','yahoo',start='2017-01-01').drop(columns=['Open','High','Low','Close','Volume'])
#Get slightly different NaNs, but NaNs with both datasets.

sp.reset_index(inplace=True)
sp = sp.set_index('Date').asfreq('q').reset_index()
#sp['Adj Close'].pct_change() #Gives me NaNs on same the Adj Close rows regardless of calc
sp


#To show that data is in tact, this operation is flawed somehow.
spy_base = pdr.DataReader('SPY','yahoo','2017-01-01')
spy_base.loc[spy_base['Date'].dt.strftime('%Y-%m-%d').str.startswith('2018')]

#OR simply
s = pdr.DataReader('SPY','yahoo','2018-01-01','2019-01-01')

enter image description here


Solution

  • Issue was pointed out to me as the fact that those values are missing (Likely weekends), Simple way to solve is to convert to pct_change(), then resample with sum to add the 3 months returns -- it's not perfectly accurate, but it's pretty close.

    spy = pdr.DataReader('SPY','yahoo','2017').drop(columns=.   ['Open','High','Low','Close','Volume'])
    spy = spy.pct_change()
    print(spy.head())
    spy.reset_index(inplace=True)
    spy_res = spy.set_index('Date').resample('Q').sum()
    spy = spy.set_index('Date').asfreq('q').reset_index().ffill(axis=0)
    spy