I am trying to fetch a page using urllib2.urlopen
(actually, I am using mechanize
, but this is the method that mechanize
calls) When I fetch the page, I am getting incomplete responses; the page gets truncated. However, if I access the non-HTTPS version of the page, I get the complete page.
I am on Arch Linux (3.5.4-1-ARCH x86_64
). I am running openssl 1.0.1c
. This problem occurs on another Arch Linux machine I own, but not when using Python 3 (3.3.0
).
This problem seems to be related to urllib2 not retrieving entire HTTP response.
I tested it on the only online Python interpreter that would let me use urllib2 (Py I/O) and it worked as expected. Here is the code:
import urllib2
u = urllib2.urlopen('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')
print u.read()[-100:]
The last lines should contain the usual </body></html>
.
When I try urllib.urlretrieve
on my machines, I get:
ContentTooShortError: retrieval incomplete: got only 11365 out of 13805 bytes
I cannot test urlretrieve
on the online interpreter because it will not let users write to temporary files. Later in the evening, I will try fetching the URL from my machine, but from a different location.
I'm getting the same error, using Python 2.7, on a different Linux system:
>>> urllib.urlretrieve('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')
---------------------------------------------------------------------------
ContentTooShortError Traceback (most recent call last)
...
ContentTooShortError: retrieval incomplete: got only 11365 out of 13805 bytes
However, the same operation can be done (and actually works for me) using requests
:
>>> import requests
>>> r = requests.get('https://wa151.avayalive.com/WAAdminPanel/login.aspx?ReturnUrl=%2fWAAdminPanel%2fprivate%2fHome.aspx')
>>> with open(somefilepath, 'w') as f:
... f.write(r.text)
Is that working for you?