I'm downloading multiple SMI files from a database called ZINC using a rather simple code I wrote. However, its speed doesn't look like so good considering the size of files (a few kb) and my internet connection. Is there a way to speed it up?
import urllib2
def job(url):
''' This function opens the URL and download SMI files from ZINC15'''
u = urllib2.urlopen(url) # Open URL
print 'downloading ' + url # Print which files is being downloaded
with open('output.smi', 'a') as local_file:
local_file.write(u.read())
with open('data.csv') as flist:
urls = ['http://zinc15.docking.org/substances/{}.smi'.format(str(line.rstrip())) for line in flist]
map(job, urls)
import threading
import Queue # the correct module name is Queue
MAX_THREADS = 10
urls = Queue.Queue()
def downloadFile():
while not urls.empty()
u = urls.get_nowait()
job(u)
for url in your_url_list:
urls.put(url)
for i in range(0, MAX_THREADS + 1):
t = threading.Thread(target=downloadFile)
t.start()
Basically it imports threading and queu module, the Queu object will hold the data to be used across multiple threads, and each thread will execute the downloadFile() function.
Easy to understand, if it does not, let me know.