Search code examples
python-requestspython-3.6urllib3

Python 3.6.5: Requests with streaming getting stuck in iter_content even if chunk_length is specified


I have been trying to use requests v2.19.1 in python 3.6.5 to download a ~2GB file from a remote URL. However, I have been repeatedly facing this issue where the code seems to get stuck forever in the for loop in trying to download the data.

My code snippet:

        with requests.get(self.model_url, stream=True, headers=headers) as response:

            if response.status_code not in [200, 201]:
                raise Exception(
                    'Error downloading model({}). Got response code {} with content {}'.format(
                        self.model_id,
                        response.status_code,
                        response.content
                    )
                )
            with open(self.download_path, 'wb') as f:
                for chunk in response.iter_content(chunk_size=1024):
                    if chunk:
                        f.write(chunk)

Each time I try to run this code, the download seems to stop at different points, and rarely reaches completion. I have tried playing around with different chunk sizes, but I still keep seeing this issue.

Some additional details:

    python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "2.3.1"
  },
  "idna": {
    "version": "2.7"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.6.5"
  },
  "platform": {
    "release": "3.10.0-693.11.1.el7.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1010009f",
    "version": "18.0.0"
  },
  "requests": {
    "version": "2.19.1"
  },
  "system_ssl": {
    "version": "100020bf"
  },
  "urllib3": {
    "version": "1.23"
  },
  "using_pyopenssl": true
}

Has anyone else faced a similar issue? If so, how did you resolve it?


Solution

  • It seems like if there is any interruption to the network during the download, the stream hangs up, and the connection goes dead. However, because no timeout is specified, the code seems to expect more packets to arrive over the dead connection. The best way I have found to handle this is to set a reasonable timeout. Once the timeout is reached after the last received package, the code exits the for loop with an exception which can be handled.