Search code examples
pythonhttpsurllib2urlopen

Urllib2 HTTPS truncated response


I am trying to fetch a page using urllib2.urlopen (actually, I am using mechanize, but this is the method that mechanize calls) When I fetch the page, I am getting incomplete responses; the page gets truncated. However, if I access the non-HTTPS version of the page, I get the complete page.

I am on Arch Linux (3.5.4-1-ARCH x86_64). I am running openssl 1.0.1c. This problem occurs on another Arch Linux machine I own, but not when using Python 3 (3.3.0).

This problem seems to be related to urllib2 not retrieving entire HTTP response.

I tested it on the only online Python interpreter that would let me use urllib2 (Py I/O) and it worked as expected. Here is the code:

import urllib2

u = urllib2.urlopen('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')

print u.read()[-100:]

The last lines should contain the usual </body></html>.

When I try urllib.urlretrieve on my machines, I get:

ContentTooShortError: retrieval incomplete: got only 11365 out of 13805 bytes

I cannot test urlretrieve on the online interpreter because it will not let users write to temporary files. Later in the evening, I will try fetching the URL from my machine, but from a different location.


Solution

  • I'm getting the same error, using Python 2.7, on a different Linux system:

    >>> urllib.urlretrieve('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')
    ---------------------------------------------------------------------------
    ContentTooShortError                      Traceback (most recent call last)
    ...
    ContentTooShortError: retrieval incomplete: got only 11365 out of 13805 bytes
    

    However, the same operation can be done (and actually works for me) using requests:

    >>> import requests
    >>> r = requests.get('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')
    >>> with open(somefilepath, 'w') as f:
    ...     f.write(r.text)
    

    Is that working for you?