I can't figure out why the main code below is giving inconsistent output from a relatively simple python pandas DataFrame operation. The part of the main code that seems to be at fault is the following line:
dfResult = dfPrices/dfPrices.shift(1)
'dfPrices' and 'dfResult' are both DataFrames.
The main code first retrieves price data and stores it in the form of a pandas panel type. Then using that same fixed/unchanging data I looped 1,000 times doing simple pandas DataFrame division operation that should yield the same result. Whenever there's an inconsistent output it will print out the inconsistent value. From the 1,000 loops I usually get 5-20 inconsistent outputs. Most of the output deemed inconsistent has a value of 0.0 but sometimes it would also be some non-zero number. So the error rate is about 1% on average, but if I use more complex operations and if there's an increase in the amount of data downloaded the error rate can reach 10%. Could there be a bug within pandas module or is it my code?
import pandas as pd
import pandas_datareader.data as web
startDate = pd.datetime(2007,7,1)
endDate = pd.datetime(2014,7,1)
stockList = ['RWX','VNQ','IJJ','IVW','VWO','IVE','TLT','GLD','SHY']
data = web.DataReader(stockList, 'yahoo', startDate,endDate)
#The for loop below is not necessary, it's just filling out some NaN values
for i in data.items:
data.loc[i,:,:].fillna(method='ffill', inplace=True)
dfPrices = data['Adj Close']
dfResult = dfPrices/dfPrices.shift(1)
reference = dfResult.loc[:,'GLD'][-1]
print 'Reference: '+str(reference)
for i in xrange(1000):
dfResult = dfPrices/dfPrices.shift(1)
actualResult = dfResult.loc[:,'GLD'][-1]
if actualResult != reference:
print actualResult
FYI, I am using Windows 10 and Anaconda distribution. I have Pandas version 0.17.0 and pandas-datareader version 0.2.0
Would appreciate any advise on this. Thank you.
@Jeff answered my question as a comment above. By updating numexpr to version 2.4.6 from 2.4.4 the problem ceased to exist.