I need to run 20 tasks asynchronously (each task runs the same function, but with a different argument). Each task uses Python's yfinance
API module. This is my current method:
args
with 20 elements; each element is the argument to be passed to the corresponding task.get_data
which I will run 20 times with a different argument each time.main
which will use asyncio.gather
to run the 20 tasks asynchronously.And here is the (pseudo)code:
import asyncio
stocks = []
args = ['arg1', 'arg2', ... , 'arg20']
async def get_data(arg):
stock = Stock(arg)
# do some yfinance calls
return stock
async def main():
global stocks
tasks = [asyncio.ensure_future(get_data(arg)) for arg in args]
stocks = await asyncio.gather(*tasks)
asyncio.run(main())
print(stocks) # should be a list of 20 return values from the 20 tasks
Assume each task on its own takes 4 seconds to run. Then the 20 tasks should run in 4 seconds if it's running asynchronously. However, it is running in 80 seconds. If I remove all the async code and just run it synchronously, it runs in the same amount of time. Any help?
Thanks.
I have checked documentation of yfinance
and see requests
library in requirements, the library ins not async. It means that you should not use it with asyncio module, you should use theading.Thread
or concurrent.futures.ThreadPoolExecutor
instead.
I made the following example for you, please run it and share your results.
from concurrent.futures import ThreadPoolExecutor
import yfinance as yf
from pprint import pprint
from time import monotonic
def get_stocks_data(name: str) -> dict:
"""some random function which extract some data"""
tick = yf.Ticker(name)
tick_info = tick.info
return tick_info
if __name__ == '__main__':
# some random stocks
stocks = [
'AAPL', 'AMD', 'AMZN', 'FB', 'GOOG', 'MSFT', 'TSLA', 'MSFT',
'AAPL', 'AMD', 'AMZN', 'FB', 'GOOG', 'MSFT', 'TSLA', 'MSFT',
]
start_time = monotonic()
# you can choose max_workers number higher and check if app works faster
# e.g choose 16 as max number of workers
with ThreadPoolExecutor(max_workers=4) as pool:
results = pool.map(get_stocks_data, stocks)
for r in results:
pprint(r)
print("*" * 150)
print(monotonic() - start_time)