Multiprocessing issue on Windows 10

I am trying to collect the size of homepages of a list of sites using multiprocessing. Following is the code :

import time
from multiprocessing import Pool, TimeoutError

start = time.time()


def sitesize(url):
    for url in sites:
        with urllib.request.urlopen(url) as u:
            page = u.read()
            print(url, len(page))


sites = [
    'https://www.yahoo.com',
    'http://www.cnn.com',
    'http://www.python.org',
    'http://www.jython.org',
    'http://www.pypy.org',
    'http://www.perl.org',
    'http://www.cisco.com',
    'http://www.facebook.com',
    'http://www.twitter.com',
    'http://arstechnica.com',
    'http://www.reuters.com',
    'http://www.abcnews.com',
    'http://www.cnbc.com',
]

if __name__ == '__main__': 

    with Pool(processes=4) as pool:
        for result in pool.imap_unordered(sitesize, sites):
            print(result)

print(f'Time taken : {time.time() - start}')

I am having a Windows 10 laptop with Python 3.9 running. I am not using venv.

This code goes into a loop - executes 4 times and takes 4 times longer.

What is the error here ? Can someone help ?

Thanks in advance

Sachin

Solution

I think you misunderstood how the pool.imap_unordered works, the provided function will be called with one of the values from the sites, whereas in your case you actually completely discard the provided url and loop on all values in the siteslist.

You should simply do

def sitesize(url):
    with urllib.request.urlopen(url) as u:
        page = u.read()
        print(url, len(page))

See the doc.