Search code examples
pythonweb-scrapinghttpspython-requestsurllib

ConnectionResetError with Python requests and urllib libraries when accessing specific URL


I'm encountering a ConnectionResetError when attempting to access a specific URL using both Python requests and urllib libraries. Despite providing appropriate headers, the connection is being forcibly closed by the remote host. This issue occurs consistently, and I'm seeking insights into its cause and potential solutions.

Here's the code snippet I'm using:

import requests

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'Accept-Language': 'en-US,en;q=0.9',
    'Cache-Control': 'no-cache',
    'Connection': 'keep-alive',
    'DNT': '1',
    'Pragma': 'no-cache',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
    'sec-ch-ua': '"Google Chrome";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
}

response = requests.get('https://newjersey.mylicense.com/verification/Search.aspx', headers=headers)

And here's the error I'm receiving:

('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

Libraries being used:

  • requests==2.31.0
  • urllib3==2.2.1

I've attempted to access the URL using both the requests and urllib libraries, providing the necessary headers to mimic a browser request. I expected the connection to be established successfully, allowing me to retrieve the desired content. However, I consistently received a ConnectionResetError indicating that the connection was forcibly closed by the remote host.

The URL in question functions as expected when accessed through a web browser, indicating that the issue may lie within the Python libraries rather than with the server itself.


Solution

  • This took a hell of a lot of troubleshooting to figure out. The main issue is that the cipher ('AES256-GCM-SHA384') used by the server for TLS connection is not one of the default ciphers used by the ssl package when establishing a secured connection. This forces the connection to fail at handshake which is causing the error we are seeing.

    Solution

    You need to use an HTTPAdapter that has the correct cipher.

    import requests
    from requests.adapters import HTTPAdapter
    from urllib3.util.ssl_ import create_urllib3_context
    
    class SoleCipherAdapter(HTTPAdapter):
        """Custom adapter that only uses 1 cipher"""
        CIPHER = 'AES256-GCM-SHA384'
    
        def init_poolmanager(self, *args, **kwargs):
            context = create_urllib3_context(ciphers=self.CIPHER)
            kwargs['ssl_context'] = context
            return super().init_poolmanager(*args, **kwargs)
    
    url = 'https://newjersey.mylicense.com/verification/Search.aspx'
    sess = requests.Session()
    sess.mount('https://', SoleCipherAdapter())
    res = sess.get(url)
    res.status_code
    # returns:
    # 200
    

    Diagnosis

    Using curl works, but it not very informative.

    C:\>curl "https://newjersey.mylicense.com/verification/Search.aspx" -vv --head
    *   Trying 208.95.153.120:443...
    * Connected to newjersey.mylicense.com (208.95.153.120) port 443
    * schannel: disabled automatic use of client certificate
    * ALPN: curl offers http/1.1
    * ALPN: server did not agree on a protocol. Uses default.
    * using HTTP/1.x
    > HEAD /verification/Search.aspx HTTP/1.1
    > Host: newjersey.mylicense.com
    > User-Agent: curl/8.4.0
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    HTTP/1.1 200 OK
    ...
    

    Python fails:

    Trying to create connection manually in Python fails with the same error you are seeing using the following:

    import socket
    import ssl
    
    host = 'newjersey.mylicense.com'
    context = ssl.create_default_context()
    
    data = b"""HEAD /verification/Search.aspx HTTP/1.1
    Host: newjersey.mylicense.com
    User-Agent: python/3.11.8
    Accept: */*
    
    """
    
    with socket.create_connection((host, 443)) as sock:
        with context.wrap_socket(sock, server_hostname=host) as secure_sock:
            secure_sock.send(data)
            print(secure_sock.read().decode())
    
    # raises:
    File ~\envs\test\Lib\ssl.py:1379, in SSLSocket.do_handshake(self, block)
       1377     if timeout == 0.0 and block:
       1378         self.settimeout(None)
    -> 1379     self._sslobj.do_handshake()
       1380 finally:
       1381     self.settimeout(timeout)
    
    ConnectionResetError: [WinError 10054] An existing connection was forcibly
    closed by the remote host
    

    I turned to creating a connection manually using openssl. This is where we finally find the information needed. (It is quite verbose.)

    C:\>openssl s_client -connect newjersey.mylicense.com:443
    

    The connection succeeds and prints the following info (I have remove bits of it for brevity):

    CONNECTED(000001B4)
    depth=2 C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", ...
    verify return:1
    ...
    ---
    Certificate chain
     0 s:CN = *.mylicense.com
       i:C = US, ST = Arizona, L = Scottsdale, O = "GoDaddy.com, Inc.", OU = ...
       a:PKEY: rsaEncryption, 2048 (bit); sigalg: RSA-SHA256
       v:NotBefore: May 28 22:06:00 2023 GMT; NotAfter: Jun 28 07:22:12 2024 GMT
    ...
    ---
    Server certificate
    -----BEGIN CERTIFICATE-----
    MIIGkjCCBXqgAwIBAgIJAKhBrHwkidbVMA0GCSqGSIb3DQEBCwUAMIG0MQswCQYD
    ...
    -----END CERTIFICATE-----
    subject=CN = *.mylicense.com
    issuer=...
    ---
    No client certificate CA names sent
    ---
    SSL handshake has read 4236 bytes and written 647 bytes
    Verification: OK
    ---
    New, TLSv1.2, Cipher is AES256-GCM-SHA384
    Server public key is 2048 bit
    Secure Renegotiation IS supported
    Compression: NONE
    Expansion: NONE
    No ALPN negotiated
    SSL-Session:
        Protocol  : TLSv1.2
        Cipher    : AES256-GCM-SHA384
        Session-ID: ...
        Session-ID-ctx:
        Master-Key: ...
        PSK identity: None
        PSK identity hint: None
        SRP username: None
        Start Time: 1712847797
        Timeout   : 7200 (sec)
        Verify return code: 0 (ok)
        Extended master secret: yes
    ---
    

    Once connected, we can send an HTTP request as a raw input:

    ...
        Verify return code: 0 (ok)
        Extended master secret: yes
    ---
    HEAD /verification/Search.aspx HTTP/1.1
    Host: newjersey.mylicense.com
    User-Agent: python/3.11.8
    Accept: */*
    
    
    

    And it returns the response to the HEAD request:

    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Pragma: no-cache
    Content-Length: 43543
    Content-Type: text/html; charset=utf-8
    Expires: -1
    Server: Microsoft-IIS/8.5
    Set-Cookie: ASP.NET_SessionId=...
    

    The import parts of the connnect information are: SSL handshake has read 4236 bytes and written 647 bytes and TLSv1.2, Cipher is AES256-GCM-SHA384. The handshake is successful here, and it tells us the TLS version and cipher it used. requests uses TLS 1.2 by default, so that is the same. That just left trying a different cipher.

    It is actually only adding one line to the previous Python code:

    import socket
    import ssl
    
    host = 'newjersey.mylicense.com'
    context = ssl.create_default_context()
    context.set_ciphers('AES256-GCM-SHA384')
    
    data = b"""HEAD /verification/Search.aspx HTTP/1.1
    Host: newjersey.mylicense.com
    User-Agent: python/3.11.8
    Accept: */*
    
    """
    
    with socket.create_connection((host, 443)) as sock:
        with context.wrap_socket(sock, server_hostname=host) as secure_sock:
            secure_sock.send(data)
            print(secure_sock.read().decode())
    

    And finally we get the expected output:

    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Pragma: no-cache
    Content-Length: 43543
    Content-Type: text/html; charset=utf-8
    Expires: -1
    Server: Microsoft-IIS/8.5
    Set-Cookie: ASP.NET_SessionId=iejmoxg