Search code examples
pythonamazon-s3python-requestsopensslurllib3

SSLError on PUT request to s3 bucket during multi-part upload using Python requests library


I'm writing a Python script to upload a large file (5GB+) to an s3 bucket using a presigned URL. I have a javascript version of this code working, so I believe the logic and endpoints are all valid.

For each part of the file, I get a presigned multipart upload URL, then I attempt a PUT request to that URL:

offset = 0
part_number = 0
with open(file_path, 'rb') as f:
     while offset < file_size_bytes:
          # Get a presigned URL for this chunk
          get_multipart_upload_url_params = {
               "partNumber": part_number,
               "uploadId": upload_id,
               "Key": file_key,
          }
          get_multipart_upload_url_response = requests.get(GET_MULTIPART_UPLOAD_URL_ENDPOINT, params=get_multipart_upload_url_params)

          if 'uploadURL' not in get_multipart_upload_url_response.json():
               print("Error: Upload Part URL not found in response")
               sys.exit(1)

          chunk_upload_url = get_multipart_upload_url_response.json()['uploadURL']

          # Upload the chunk
          remaining_bytes = file_size_bytes - offset
          chunk_size = min(MAX_CHUNK_SIZE, remaining_bytes)
          chunk = f.read(chunk_size)
          if not chunk:
               break

          response = requests.put(chunk_upload_url, data=chunk)
          ...

When requests.put executes, I see an error that looks like:

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bucket-name.s3.amazonaws.com', port=443): Max retries exceeded with url: [PRESIGNED URL REDACTED] (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2426)')))

What's extra confusing about this is that when I implement a single-part upload function, it works fine using the same interface:

    # Get presigned upload URL
    upload_response = requests.get(SINGLE_PART_UPLOAD_API_ENDPOINT, params={
        'filename': filename,
    }).json()

    if 'uploadURL' not in upload_response or 'Key' not in upload_response:
        print("Error: Upload URL or file key not found in response")
        sys.exit(1)

    upload_url = upload_response['uploadURL']
    file_key = upload_response['Key']

    # Upload the file using requests
    print(f"Uploading: {file_path}")
    with open(file_path, 'rb') as f:
        response = requests.put(upload_url, data=f, headers={"Content-Type": "application/octet-stream"})
    ...

Some of the things I've tried:

  • Print the presigned multipart upload URL to carefully inspect it to make sure they are valid
  • Confirming that a request to the s3 bucket URL resolves by making the request via a web browser
  • Switching to a different request library
  • Running the script on a different computer
  • Upgrading OpenSSL, requests, and urllib3

Solution

  • The problem was that partNumber is 1 indexed and I was setting the initial part number value to 0. I will leave this post up in the hopes that it helps someone else in the future.

    Reference: https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPart.html