In the code below, I am considering using mutli-threading or multi-process for fetching from url. I think pools would be ideal, Can anyone help suggest solution..
Idea: pool thread/process, collect data... my preference is process over thread, but not sure.
import urllib
URL = "http://download.finance.yahoo.com/d/quotes.csv?s=%s&f=sl1t1v&e=.csv"
symbols = ('GGP', 'JPM', 'AIG', 'AMZN','GGP', 'JPM', 'AIG', 'AMZN')
#symbols = ('GGP')
def fetch_quote(symbols):
url = URL % '+'.join(symbols)
fp = urllib.urlopen(url)
try:
data = fp.read()
finally:
fp.close()
return data
def main():
data_fp = fetch_quote(symbols)
# print data_fp
if __name__ =='__main__':
main()
You have a process that request, several information at once. Let's try to fetch these information one by one.. Your code will be :
def fetch_quote(symbols):
url = URL % '+'.join(symbols)
fp = urllib.urlopen(url)
try:
data = fp.read()
finally:
fp.close()
return data
def main():
for symbol in symbols:
data_fp = fetch_quote((symbol,))
print data_fp
if __name__ == "__main__":
main()
So main() call, one by one every url to get the data. Let's multiprocess it with a pool:
import urllib
from multiprocessing import Pool
URL = "http://download.finance.yahoo.com/d/quotes.csv?s=%s&f=sl1t1v&e=.csv"
symbols = ('GGP', 'JPM', 'AIG', 'AMZN','GGP', 'JPM', 'AIG', 'AMZN')
def fetch_quote(symbols):
url = URL % '+'.join(symbols)
fp = urllib.urlopen(url)
try:
data = fp.read()
finally:
fp.close()
return data
def main():
for symbol in symbols:
data_fp = fetch_quote((symbol,))
print data_fp
if __name__ =='__main__':
pool = Pool(processes=5)
for symbol in symbols:
result = pool.apply_async(fetch_quote, [(symbol,)])
print result.get(timeout=1)
In the following main a new process is created to request each symbols urls.
Note: on python, since the GIL is present, multithreading must be mostly considered as a wrong solution.
For documentation see: Multiprocessing in python