I have this code which gets a list of symbols from wikipedia and then gets the stock data from yahoofinance. This is a simple code which was working fine up till a couple days ago but for some reason i getting the unable to read symbol
error on many stocks. Is yahoo doing this? What can i do to fix this error. I cannot ignore this because over 50 symbols were NaN and when i rerun the code different symbols show up in the error
pandas_datareader version: 0.8.1 Code:
import datetime
import pandas as pd
import numpy as np
import csv
from pandas_datareader import data as web
import matplotlib
import matplotlib.pyplot as plt
import requests
import bs4 as bs
from urllib.request import urlopen
from bs4 import BeautifulSoup
import tqdm
from pandas import DataFrame
import seaborn as sns
resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class': 'wikitable sortable'})
tickers = []
for row in table.findAll('tr')[1:]:
ticker = row.findAll('td')[0].text.strip()
tickers.append(ticker)
start = datetime.date(2008,11,1)
end = datetime.date.today()
# df = web.get_data_yahoo(tickers, start, end)
df = web.DataReader(tickers, 'yahoo', start, end)
Error:
C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py:270: SymbolWarning: Failed to read symbol: 'T', replacing with NaN.
warnings.warn(msg.format(sym), SymbolWarning)
C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py:270: SymbolWarning: Failed to read symbol: 'BKR', replacing with NaN.
warnings.warn(msg.format(sym), SymbolWarning)
C:\ProgramData\Anaconda3\lib\site-packages\pandas_datareader\base.py:270: SymbolWarning: Failed to read symbol: 'BRK.B', replacing with NaN.
warnings.warn(msg.format(sym), SymbolWarning)
Looks like the date issue could be fixed by replacing tickers with a '.'
to '-'
per this github issue
Also you do not need requests
or BeautifulSoup
just use pd.read_html
I successfully created a DataFrame with no warnings or errors in python 3.6.8
and pandas 24.2
for all 505 tickers. See example below:
import pandas as pd
from pandas_datareader import data as web
import datetime
# no need for requests or BeautifulSoup use read_html
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]
# convert symbol column to list
tickers = df['Symbol'].values.tolist()
# list comprehension to replace data in strings
t = [x.replace('.', '-') for x in tickers]
start = datetime.date(2008,11,1)
end = datetime.date.today()
df2 = web.DataReader(t, 'yahoo', start, end)
Here is the list of all 505 tickers in the DataFrame:
print(*df2.columns.levels[1])
A AAL AAP AAPL ABBV ABC ABMD ABT ACN ADBE ADI ADM ADP ADS ADSK AEE AEP AES AFL AGN AIG AIV AIZ AJG AKAM ALB ALGN ALK ALL ALLE ALXN AMAT AMCR AMD AME AMG AMGN AMP AMT AMZN ANET ANSS ANTM AON AOS APA APD APH APTV ARE ARNC ATO ATVI AVB AVGO AVY AWK AXP AZO BA BAC BAX BBT BBY BDX BEN BF-B BIIB BK BKNG BKR BLK BLL BMY BR BRK-B BSX BWA BXP C CAG CAH CAT CB CBOE CBRE CBS CCI CCL CDNS CDW CE CERN CF CFG CHD CHRW CHTR CI CINF CL CLX CMA CMCSA CME CMG CMI CMS CNC CNP COF COG COO COP COST COTY CPB CPRI CPRT CRM CSCO CSX CTAS CTL CTSH CTVA CTXS CVS CVX CXO D DAL DD DE DFS DG DGX DHI DHR DIS DISCA DISCK DISH DLR DLTR DOV DOW DRE DRI DTE DUK DVA DVN DXC EA EBAY ECL ED EFX EIX EL EMN EMR EOG EQIX EQR ES ESS ETFC ETN ETR EVRG EW EXC EXPD EXPE EXR F FANG FAST FB FBHS FCX FDX FE FFIV FIS FISV FITB FLIR FLS FLT FMC FOX FOXA FRC FRT FTI FTNT FTV GD GE GILD GIS GL GLW GM GOOG GOOGL GPC GPN GPS GRMN GS GWW HAL HAS HBAN HBI HCA HD HES HFC HIG HII HLT HOG HOLX HON HP HPE HPQ HRB HRL HSIC HST HSY HUM IBM ICE IDXX IEX IFF ILMN INCY INFO INTC INTU IP IPG IPGP IQV IR IRM ISRG IT ITW IVZ JBHT JCI JEC JKHY JNJ JNPR JPM JWN K KEY KEYS KHC KIM KLAC KMB KMI KMX KO KR KSS KSU L LB LDOS LEG LEN LH LHX LIN LKQ LLY LMT LNC LNT LOW LRCX LUV LVS LW LYB M MA MAA MAC MAR MAS MCD MCHP MCK MCO MDLZ MDT MET MGM MHK MKC MKTX MLM MMC MMM MNST MO MOS MPC MRK MRO MS MSCI MSFT MSI MTB MTD MU MXIM MYL NBL NCLH NDAQ NEE NEM NFLX NI NKE NLOK NLSN NOC NOV NOW NRG NSC NTAP NTRS NUE NVDA NVR NWL NWS NWSA O OKE OMC ORCL ORLY OXY PAYX PBCT PCAR PEAK PEG PEP PFE PFG PG PGR PH PHM PKG PKI PLD PM PNC PNR PNW PPG PPL PRGO PRU PSA PSX PVH PWR PXD PYPL QCOM QRVO RCL RE REG REGN RF RHI RJF RL RMD ROK ROL ROP ROST RSG RTN SBAC SBUX SCHW SEE SHW SIVB SJM SLB SLG SNA SNPS SO SPG SPGI SRE STI STT STX STZ SWK SWKS SYF SYK SYY T TAP TDG TEL TFX TGT TIF TJX TMO TMUS TPR TRIP TROW TRV TSCO TSN TTWO TWTR TXN TXT UA UAA UAL UDR UHS ULTA UNH UNM UNP UPS URI USB UTX V VAR VFC VIAB VLO VMC VNO VRSK VRSN VRTX VTR VZ WAB WAT WBA WCG WDC WEC WELL WFC WHR WLTW WM WMB WMT WRK WU WY WYNN XEC XEL XLNX XOM XRAY XRX XYL YUM ZBH ZION ZTS
print(len(df2.columns.levels[1]))
505