I would love to have an idea of the time each action in my script takes. The script below grabs stocks with earnings releases in the next 10 days, then grabs their current stock price, and finally grabs other items I'm interested in from the yfinance API.
When I use the status tracker "trange()" from the tqdm package, I have all sorts of issues. The script takes ages to run and in the last chunk where fundamental and technical data is extracted from the API, the script repeats the requests x times for each stock, where x is the total number of stocks in the Symbols list.
Can someone please help me understand what's going wrong with the tqdm feature I'm trying to incorporate?:
import datetime
import pandas as pd
import time
import requests
import yfinance as yf
from tqdm import trange
import sys
StartTime = time.time()
#####################################################
### ###
### Grab Stocks with Earnings in Next 30 Days ###
### ###
#####################################################
CalendarDays = 30 #<-- specify the number of calendar days you want to grab earnings release info for
tables = [] #<-- initialize an empty list to store your tables
print('1. Grabbing companies with earnings releases in the next ' + str(CalendarDays) + ' days.')
# for i in trange(CalendarDays, file = sys.stdout, desc = '1. Grabbing companies with earnings releases in the next ' + str(CalendarDays) + ' days'):
for i in range(CalendarDays): #<-- Grabs earnings release info for the next x days on the calendar
try:
date = (datetime.date.today() + datetime.timedelta(days = i )).isoformat() #get tomorrow in iso format as needed'''
pd.set_option('display.max_column',None)
url = pd.read_html("https://finance.yahoo.com/calendar/earnings?day="+date, header=0)
table = url[0]
table['Earnings Release Date'] = date
tables.append(table) #<-- append each table into your list of tables
except ValueError:
continue
df = pd.concat(tables, ignore_index = True) #<-- take your list of tables into 1 final dataframe
df_unique = df.drop_duplicates(subset=['Symbol'], keep='first', ignore_index = True)
DataSet = df_unique.drop(['Reported EPS','Surprise(%)'], axis = 1)
Symbols = df_unique['Symbol'].to_list()
###################################
### ###
### Grab Latest Stock Price ###
### ###
###################################
print('2. Grabbing latest share prices for ' + str(len(Symbols)) + ' stocks.')
df_temp = pd.DataFrame()
# for i in trange(len(Symbols), file = sys.stdout, desc = '2. Grabbing latest stock prices'):
for symbol in Symbols:
try:
params = {'symbols': symbol,
'range': '1d',
'interval': '1d',
'indicators': 'close',
'includeTimestamps': 'false',
'includePrePost': 'false',
'corsDomain': 'finance.yahoo.com',
'.tsrc': 'finance'
}
url = 'https://query1.finance.yahoo.com/v7/finance/spark'
r = requests.get(url, params=params)
data = r.json()
Price = data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]
df_stock = pd.DataFrame({'Symbol' : [symbol],
'Current Price' : [Price]
})
df_temp = df_temp.append(df_stock)
except KeyError:
continue
DataSet = pd.merge(DataSet, df_temp[['Symbol', 'Current Price']], on = 'Symbol', how = "left")
###########################################
### ###
### Grab Other Important Stock Info ###
### ###
###########################################
print('3. Grabbing stock fundamental and technical metrics.')
StartTime = time.time()
df_temp2 = pd.DataFrame()
# for i in trange(len(Symbols), file = sys.stdout, desc = 'Grabbing stock fundamental and technical metrics'):
for symbol in Symbols:
try:
Ticker = yf.Ticker(symbol).info
Sector = Ticker.get('sector')
Industry = Ticker.get('industry')
P2B = Ticker.get('priceToBook')
P2E = Ticker.get('trailingPE')
# print(symbol, Sector, Industry, P2B, P2E)
df_stock = pd.DataFrame({'Symbol' : [symbol],
'Sector' : [Sector],
'Industry' : [Industry],
'PriceToBook' : [P2B],
'PriceToEarnings' : [P2E],
})
df_temp2 = df_temp2.append(df_stock)
except: KeyError
pass
DataSet = pd.merge(DataSet, df_temp2, on = 'Symbol', how = "left")
##############################################################################
##############################################################################
##############################################################################
ExecutionTime = (time.time() - StartTime)
print('Script is complete! This script took ' + format(str(round(ExecutionTime, 1))) + ' seconds to run.')
TodaysDate = datetime.date.today().isoformat()
You can use the tqdm function (rather than trange) to generate a progress bar over any iterable. trange is specifically used when iterating over a specified numerical range (link). So you can import like this:
from tqdm import tqdm
And use tqdm as your wrapper:
for symbol in tqdm(Symbols, file = sys.stdout, desc = '2. Grabbing latest stock prices'):
Note that you want to iterate over Symbols, not len(Symbols). trange is likely an appropriate choice for the first part of your script, as you are iterating over a specified numerical range rather than a more generic iterable.