I have a csv file which contains the list of symbols I wish to pull from provider (about 6000 of them). It takes almost 3 hours to download the whole symbol list and save it to csv. Takes about 3-4 sec to download each symbol.
I'm wondering, would it be possible / quicker to use multiprocessing / hyper threading to quicken this process?
What would be the correct way to apply Multi-process or Multi-threading to speed up the process ?
def f():
for ticker in tickers:
df = get_eod_data(ticker, ex,api_key='xxxxxxxxxxxxxxxxxxx')
df.columns = ['Open','High','Low','Close','Adj close','Volume']
df.to_csv('Path\\to\\file\\{}.csv'.format(ticker))
p = Pool(20)
p.map(f)
Thanks !!
Upon a little research, I think this is the best way to go :
x = ['1','2','3','4','5','6', ..... '3000']
def f(x):
df = get_eod_data(ticker, ex,api_key='xxxxxxxxxxxxxxxxxxx')
df.columns = ['Open','High','Low','Close','Adj close','Volume']
df.to_csv('Path\\to\\file\\{}.csv'.format(ticker))
def mp_handler_1():
p1 = multiprocessing.Pool(10)
p1.map(f, x)
if __name__ == '__main__':
mp_handler_1()
From the original 3 - 4 hours that it took to download all symbols, using multiprocessing.Pool it took 35 - 40 min !! It created 10 python processes and processed the function in parallel, with no data loss or corruption. The only downside , if this requires more memory than is available then you will get a MemoryError.