Search code examples
pythonurllib2urllib

Different crawling behavior on Ubuntu and Windows


This piece of code retrieves the content of a page of Google Movies:

import urllib2
f = urllib2.urlopen("https://www.google.com/movies?hl=fr&tid=4f451a87a71bfa51&date=0")
print(f.read())

It correctly contains the movies scheduled at this theater when I run the script on my Windows PC. But I tried to execute the script on 3 different Ubuntu servers, and every time the content returned is a well-formed page that says that there are no movies currently scheduled.

Do you know what can cause this difference in behavior, of just 3 lines of code? I also tried urllib.urlopen and the output is the same.


Solution

  • It has nothing to do with the OS itself, or with Python in general. I tried to access this URL from a Windows machine in a browser and also got something along the lines of "No films found" (used Google Translate as I don't speak French).

    I suspect this URL is location-sensitive. When you accessed it through your Windows machine it managed to find your location (actual location or an estimate based on your IP).

    When you tried to access it through your Linux machines, it couldn't determine your location (or it did, and decided that your location is "wrong") so it doesn't match any theater schedule.