Search code examples
pythondownloadurllib2python-2.x

Speed up multiple downloads with urllib2


I'm downloading multiple SMI files from a database called ZINC using a rather simple code I wrote. However, its speed doesn't look like so good considering the size of files (a few kb) and my internet connection. Is there a way to speed it up?

import urllib2


def job(url):
    ''' This function opens the URL and download SMI files from ZINC15'''

    u = urllib2.urlopen(url) # Open URL
    print 'downloading ' + url # Print which files is being downloaded
    with open('output.smi', 'a') as local_file:
        local_file.write(u.read())


with open('data.csv') as flist:
    urls = ['http://zinc15.docking.org/substances/{}.smi'.format(str(line.rstrip())) for line in flist]
    map(job, urls)

Solution

  • import threading
    import Queue # the correct module name is Queue
    
    MAX_THREADS = 10
    urls = Queue.Queue()
    
    def downloadFile():
        while not urls.empty()
            u = urls.get_nowait()
            job(u)
    
    
    for url in your_url_list:
        urls.put(url)
    
    for i in range(0, MAX_THREADS + 1):
        t = threading.Thread(target=downloadFile)
        t.start()
    

    Basically it imports threading and queu module, the Queu object will hold the data to be used across multiple threads, and each thread will execute the downloadFile() function.

    Easy to understand, if it does not, let me know.