Search code examples
pythonweb-scrapingtqdmpandas

Status bar using tqdm


I would love to have an idea of the time each action in my script takes. The script below grabs stocks with earnings releases in the next 10 days, then grabs their current stock price, and finally grabs other items I'm interested in from the yfinance API.

When I use the status tracker "trange()" from the tqdm package, I have all sorts of issues. The script takes ages to run and in the last chunk where fundamental and technical data is extracted from the API, the script repeats the requests x times for each stock, where x is the total number of stocks in the Symbols list.

Can someone please help me understand what's going wrong with the tqdm feature I'm trying to incorporate?:

import datetime
import pandas as pd
import time
import requests
import yfinance as yf
from tqdm import trange
import sys


StartTime = time.time()


#####################################################
###                                               ###
###   Grab Stocks with Earnings in Next 30 Days   ###
###                                               ###
#####################################################

CalendarDays = 30 #<-- specify the number of calendar days you want to grab earnings release info for
tables = [] #<-- initialize an empty list to store your tables

print('1. Grabbing companies with earnings releases in the next ' + str(CalendarDays) + ' days.')

# for i in trange(CalendarDays, file = sys.stdout, desc = '1. Grabbing companies with earnings releases in the next ' + str(CalendarDays) + ' days'):

for i in range(CalendarDays): #<-- Grabs earnings release info for the next x days on the calendar
        try: 
            date = (datetime.date.today() + datetime.timedelta(days = i )).isoformat() #get tomorrow in iso format as needed'''
            pd.set_option('display.max_column',None)
            url = pd.read_html("https://finance.yahoo.com/calendar/earnings?day="+date, header=0)
            table = url[0]
            table['Earnings Release Date'] = date
            tables.append(table) #<-- append each table into your list of tables
        except ValueError:
            continue

df = pd.concat(tables, ignore_index = True) #<-- take your list of tables into 1 final dataframe
df_unique = df.drop_duplicates(subset=['Symbol'], keep='first', ignore_index = True)
DataSet = df_unique.drop(['Reported EPS','Surprise(%)'], axis = 1)

Symbols = df_unique['Symbol'].to_list()


###################################
###                             ###
###   Grab Latest Stock Price   ###
###                             ###
###################################

print('2. Grabbing latest share prices for ' + str(len(Symbols)) + ' stocks.')

df_temp = pd.DataFrame()

# for i in trange(len(Symbols), file = sys.stdout, desc = '2. Grabbing latest stock prices'):

for symbol in Symbols:
        try:
            params = {'symbols': symbol,
                      'range': '1d',
                      'interval': '1d',
                      'indicators': 'close',
                      'includeTimestamps': 'false',
                      'includePrePost': 'false',
                      'corsDomain': 'finance.yahoo.com',
                      '.tsrc': 'finance'
                      }

            url = 'https://query1.finance.yahoo.com/v7/finance/spark'

            r = requests.get(url, params=params)
            data = r.json()
                
            Price = data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]
        
            df_stock = pd.DataFrame({'Symbol' : [symbol],
                                     'Current Price' : [Price]
                                     })
        
            df_temp = df_temp.append(df_stock)
        except KeyError:
                continue


DataSet = pd.merge(DataSet, df_temp[['Symbol', 'Current Price']], on = 'Symbol', how = "left")


###########################################
###                                     ###
###   Grab Other Important Stock Info   ###
###                                     ###
###########################################

print('3. Grabbing stock fundamental and technical metrics.')

StartTime = time.time()

df_temp2 = pd.DataFrame()

# for i in trange(len(Symbols), file = sys.stdout, desc = 'Grabbing stock fundamental and technical metrics'):

for symbol in Symbols:
        try:
            Ticker = yf.Ticker(symbol).info
            Sector = Ticker.get('sector')
            Industry = Ticker.get('industry')
            P2B = Ticker.get('priceToBook')
            P2E = Ticker.get('trailingPE')
            # print(symbol, Sector, Industry, P2B, P2E)
            
            df_stock = pd.DataFrame({'Symbol' : [symbol],
                                     'Sector' : [Sector],
                                     'Industry' : [Industry],
                                     'PriceToBook' : [P2B],
                                     'PriceToEarnings' : [P2E],
                                     })
            
            df_temp2 = df_temp2.append(df_stock)
        except: KeyError
        pass


DataSet = pd.merge(DataSet, df_temp2, on = 'Symbol', how = "left")


##############################################################################
##############################################################################
##############################################################################


ExecutionTime = (time.time() - StartTime)
print('Script is complete! This script took ' + format(str(round(ExecutionTime, 1))) + ' seconds to run.')


TodaysDate = datetime.date.today().isoformat()

Solution

  • You can use the tqdm function (rather than trange) to generate a progress bar over any iterable. trange is specifically used when iterating over a specified numerical range (link). So you can import like this:

    from tqdm import tqdm
    

    And use tqdm as your wrapper:

    for symbol in tqdm(Symbols, file = sys.stdout, desc = '2. Grabbing latest stock prices'):
    

    Note that you want to iterate over Symbols, not len(Symbols). trange is likely an appropriate choice for the first part of your script, as you are iterating over a specified numerical range rather than a more generic iterable.