Urllib urlopen/urlretrieve too many open files error

Problem

I'm trying to download >100.000 files from a ftp server in parallel (using threads). I previously tried it with urlretrieve as answered here, however this gave me the following error: URLError(OSError(24, 'Too many open files')). Apparently this problem is a bug (cannot find the reference anymore), so I tried to use urlopen in combination with shutil and then write it to file which I could close myself, as described here. This seemed to work fine, but then I got the same error again: URLError(OSError(24, 'Too many open files')). I thought whenever writing to a file is incomplete or will fail the with statement will cause to file to close itself, but seemingly the files still keep open and will eventually cause the script to halt.

Question

How can I prevent this error, i.e. make sure that every files get closed?

Code

import csv
import urllib.request
import shutil
from multiprocessing.dummy import Pool

def url_to_filename(url):
    filename = 'patric_genomes/' + url.split('/')[-1]
    return filename

def download(url):
    url = url.strip()
    try:
        with urllib.request.urlopen(url) as response, open(url_to_filename(url), 'wb') as out_file:
            shutil.copyfileobj(response, out_file)
    except Exception as e:
        return None, e

def build_urls(id_list):
    base_url = 'ftp://some_ftp_server/'
    urls = []
    for some_id in id_list:
        url = base_url + some_id + '/' + some_id + '.fna'
        print(url)
        urls.append(url)
    return urls


if __name__ == "__main__":
    with open('full_data/genome_ids.txt') as inFile:
        reader = csv.DictReader(inFile, delimiter = '\t')
        ids = {row['some_id'] for row in reader}
        urls = build_urls(ids)
        p = Pool(100)
        print(p.map(download, urls))

Solution

You may try to use contextlib to close your file as such:

import contextlib
[ ... ]

with contextlib.closing(urllib.request.urlopen(url)) as response, open(url_to_filename(url), 'wb') as out_file:
        shutil.copyfileobj(response, out_file)

[ ... ]

According to the docs:

contextlib.closing(thing)

    Return a context manager that closes thing upon completion of the block. [ ... ] without needing to explicitly close page. Even if an error occurs, page.close() will be called when the with block is exited.

*** A workaround would be raising the open files limit on your Linux OS. Check your current open files limit:

ulimit -Hn

Add the following line in your /etc/sysctl.conf file:

fs.file-max = <number>

Where <number> is the new upper limit of open files you want to set. Close and save the file.

sysctl -p

So that changes take effect.