I'm trying to get ticker data from all listings in the S&P 500 from the python programming for finance tutorials (link). Unfortunately, I get the following error while running my code:
requests.exceptions.ContentDecodingError: ('Received response with
content-encoding: gzip, but failed to decode it.', error('Error -3 while
decompressing data: incorrect data check',))
I guess that this issue comes from different encodings for different stocks.How can I alter my code (shown below) to allow gzip decoding?
import bs4 as bs
import pickle
import requests
import datetime as dt
import os
import pandas as pd
import pandas_datareader.data as web
def save_sp500_tickers():
response = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
#retrieve src code from url
soup = bs.BeautifulSoup(response.text, 'lxml')
#convert src code into bs4 format
table = soup.find('table', {'class':'wikitable sortable'})
#search the new soup object for the table tag of class wikitable sortable
tickers = []
#create a target array
for row in table.findAll('tr')[1:]:
#for each row in table find all rows sliced from index1
ticker = row.findAll('td')[0].text
#find all tableDefinitions and convert to text
tickers.append(ticker)
#add ticker to our tickers array
with open("sp500tickers.pickle","wb") as f:
pickle.dump(tickers, f)
print(tickers)
return tickers
def getDataFromYahoo(reload_sp500 = False):
if(reload_sp500):
tickers = save_sp500_tickers()
else:
with open("sp500tickers.pickle","rb") as f:
tickers = pickle.load(f)
if not os.path.exists('stock_dfs'):
os.makedirs('stock_dfs')
start = dt.datetime(2010,1,1)
end = dt.datetime(2018,7,26)
for ticker in tickers:
print(ticker)
if not os.path.exists('stocks_dfs/{}.csv'.format(ticker)):
df = web.DataReader(ticker, 'yahoo', start, end)
else:
print('Already have {}'.format(ticker))
getDataFromYahoo()
Traceback (most recent call last):
File "C:\Users\dan gilmore\Desktop\EclipseOxygen64WCSPlugin\cherryPY\S7P\index.py", line 55, in <module>
getDataFromYahoo()
File "C:\Users\dan gilmore\Desktop\EclipseOxygen64WCSPlugin\cherryPY\S7P\index.py", line 51, in getDataFromYahoo
df = web.DataReader(ticker, 'yahoo', start, end)
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\data.py", line 311, in DataReader
session=session).read()
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\base.py", line 210, in read
params=self._get_params(self.symbols))
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\yahoo\daily.py", line 129, in _read_one_data
resp = self._get_response(url, params=params)
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\base.py", line 132, in _get_response
headers=headers)
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 525, in get
return self.request('GET', url, **kwargs)
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 662, in send
r.content
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\models.py", line 827, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\models.py", line 754, in generate
raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect data check',))
The root problem here is that you're following an out-of-date tutorial.
If you look at the docs for pandas-datareader
, right at the top, there's a big box that says:
Warning
As of v0.6.0 Yahoo!, Google Options, Google Quotes and EDGAR have been immediately deprecated due to large changes in their API and no stable replacement.
Whenever you're following a tutorial or blog post and something doesn't work right, the first thing you should do is go look at the actual documentation for whatever they're teaching you to use. Things change, and things that wrap up web APIs change especially rapidly.
Anyway, if you scroll down to the list of data sources, you'll see that there is no Yahoo entry. But the code is still there in the source. So, rather than getting an error about no such source, you get an error a bit later from trying to use a broken source.
At the surface level, what's happening is that the datareader
code is making some kind of request (you'd have to dig into the library, or maybe capture it with Wireshark, to see what the URL and headers are) that gets a response that claims to use the gzip
content-encoding, but does it wrong.
Content encoding is something that's applied to the page by the web server and undone by your browser or client, usually compression, to make the page take less time to send over the network. gzip
is the most common form of compression. It's a very simple format, which is why it's so commonly used (a server can gzip thousands of pages without needing a farm of supercomputers), but that means if something goes wrong—like the server just truncates the stream 16KB in or something—you can't really tell what went wrong except that the gzip decompress failed.
But regardless, there's no way to fix this;1 you have to rewrite your code to use a different data source.
If you don't understand the code well enough to do that, you have to find a more up-to-date tutorial to learn from.
1. Unless you want to figure out the new Yahoo API, assuming there is one, and figure out how to parse it, and write a whole new pandas-datareader
source, even though the experts who write that library have given up trying to deal with Yahoo…