Search code examples
pythondictionaryconcurrencysubmitthreadpoolexecutor

Python concurrent executor.map() and submit()


I'm learning how to use concurrent with executor.map() and executor.submit().

I have a list that contains 20 url and want to send 20 requests at the same time, the problem is .submit() returns results in different order than the given list from the beginning. I've read that map() does what I need but i don't know how to write code with it.

The code below worked perfect to me.

Questions: is there any code block of map()that equivalent to the code below, or any sorting methods that can sort the result list from submit() by order of the list given?

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

Solution

  • Here is the map version of your existing code. Note that the callback now accepts a tuple as a parameter. I added an try\except in the callback so the results will not throw an error. The results are ordered according to the input list.

    from concurrent.futures import ThreadPoolExecutor
    import urllib.request
    
    URLS = ['http://www.foxnews.com/',
            'http://www.cnn.com/',
            'http://www.wsj.com/',
            'http://www.bbc.co.uk/',
            'http://some-made-up-domain.com/']
    
    # Retrieve a single page and report the url and contents
    def load_url(tt):  # (url,timeout)
        url, timeout = tt
        try:
          with urllib.request.urlopen(url, timeout=timeout) as conn:
             return (url, conn.read())
        except Exception as ex:
            print("Error:", url, ex)
            return(url,"")  # error, return empty string
    
    with ThreadPoolExecutor(max_workers=5) as executor:
        results = executor.map(load_url, [(u,60) for u in URLS])  # pass url and timeout as tuple to callback
        executor.shutdown(wait=True) # wait for all complete
        print("Results:")
    for r in results:  # ordered results, will throw exception here if not handled in callback
        print('   %r page is %d bytes' % (r[0], len(r[1])))
    

    Output

    Error: http://www.wsj.com/ HTTP Error 404: Not Found
    Results:
       'http://www.foxnews.com/' page is 320028 bytes
       'http://www.cnn.com/' page is 1144916 bytes
       'http://www.wsj.com/' page is 0 bytes
       'http://www.bbc.co.uk/' page is 279418 bytes
       'http://some-made-up-domain.com/' page is 64668 bytes