I'm learning how to use concurrent with executor.map()
and executor.submit()
.
I have a list that contains 20 url and want to send 20 requests at the same time, the problem is .submit()
returns results in different order than the given list from the beginning. I've read that map()
does what I need but i don't know how to write code with it.
The code below worked perfect to me.
Questions: is there any code block of map()
that equivalent to the code below, or any sorting methods that can sort the result list from submit()
by order of the list given?
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the url and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
Here is the map version of your existing code. Note that the callback now accepts a tuple as a parameter. I added an try\except in the callback so the results will not throw an error. The results are ordered according to the input list.
from concurrent.futures import ThreadPoolExecutor
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://www.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the url and contents
def load_url(tt): # (url,timeout)
url, timeout = tt
try:
with urllib.request.urlopen(url, timeout=timeout) as conn:
return (url, conn.read())
except Exception as ex:
print("Error:", url, ex)
return(url,"") # error, return empty string
with ThreadPoolExecutor(max_workers=5) as executor:
results = executor.map(load_url, [(u,60) for u in URLS]) # pass url and timeout as tuple to callback
executor.shutdown(wait=True) # wait for all complete
print("Results:")
for r in results: # ordered results, will throw exception here if not handled in callback
print(' %r page is %d bytes' % (r[0], len(r[1])))
Output
Error: http://www.wsj.com/ HTTP Error 404: Not Found
Results:
'http://www.foxnews.com/' page is 320028 bytes
'http://www.cnn.com/' page is 1144916 bytes
'http://www.wsj.com/' page is 0 bytes
'http://www.bbc.co.uk/' page is 279418 bytes
'http://some-made-up-domain.com/' page is 64668 bytes