Search code examples
pythoncurlpycurl

Python: urlopen() versus CURL


I'm writing a webcrawler using Python and enjoying it a lot! But I've noticed some differences between the result produced by urlopen(url).read() on Python and by curl on terminal. I tried to install the pycurl module with no success. Is there a simple way to produce the CURL result on Python?

UPDATE

In this case I parsed this URL. I passed the same headers on both requests User-Agent: Mozilla/5.0. Here are the outputs:


Solution

  • I know this is an old question but maybe the answer can be still useful.

    I had the same problem and what I did to solve it was creating a php file which printed the request headers. Then I executed a curl and an urlopen and I checked the differences between the headers. You can find an example of that script in PHP docs.

    In addition, you can go to your browser and check which headers are being send. Doing this I saw that urlopen sends connection: close instead of keep-alive.

    So finally I add the keep-alive header and urlopen began to work as curl. This was my concrete problem but maybe yours is different due to the server requirements and you need to add or remove another header.