Search code examples
pythonheaderbeautifulsouppython-requestsresponse

Why doesn't the page response change when this query is updated?


I am having trouble reliably extracting for a variable (property count) in pages for the website https://www.booking.com.

When searching for Brazil, it shows 29,454 properties.

But when trying to update the query to be for a different country, it lists the same number (plus or minus 1). I'm not sure if this has to do with the headers or query.

Maybe there is an easier way to extract the information

Brazil should have 29,000+ properties and Uruguay should have 1,629

The following code is expected to operate as if searching for the country's name at Booking.com

import requests
from bs4 import BeautifulSoup

from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

url = "https://www.booking.com/searchresults.en-gb.html"

countries = [u'Brazil', u'Uruguay']

for country in countries:

    querystring = {"label": "gen173nr-1DCAEoggJCAlhYSDNiBW5vcmVmcgV1c19vcogBAZgBMbgBB8gBDdgBA-gBAfgBApICAXmoAgM",
                   "lang": "en-gb", "sid": "5f9b0b3af27a0a0b48017c6c387d8224", "track_lsso": "2", "sb": "1",
                   "src": country, "src_elem": "sb",
                   "ss": country.replace(' ', '+'), "ssne": country, "ssne_untouched": country, "dest_id": "30", "dest_type": "country",
                   "checkin_monthday": "", "checkin_month": "", "checkin_year": "", "checkout_monthday": "",
                   "checkout_month": "", "checkout_year": "", "room1": "A", "no_rooms": "1", "group_adults": "1",
                   "group_children": "0"}

    headers = {
        'upgrade-insecure-requests': "1",
        'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36",
        'accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        'content-encoding': "br",
        'accept-language': "en-US,en;q=0.8",
        'content-type': "text/html;charset=UTF-8",
        'cache-control': "no-cache",
        'postman-token': "124b1e3b-c4de-9ab0-162f-003770797f9f"
    }

    response = BeautifulSoup(requests.request("GET", url, headers=headers, params=querystring, verify=False).content,
                             "html.parser")

    totalPropCount = response.select('h1[class="sorth1"]')[0].text

    print totalPropCount.split(': ')[1], ' for ', country

Solution

  • Your problem is you're hardcoding the dest_id. A dest_id of 30 simply points to Brazil!

    You can verify by using the following:

    querystring = querystring = {"src": country,
                   "dest_id": "225", "dest_type": "country",
                   }
    

    Note that I removed a lot of the stuff to simplify, but I most importantly changed the dest_id to 225. 225 is the dest_id of Uraguay, while dest_id 30 (the one you had hard coded) was Brazil.

    Every time you were doing your request, you were requesting Brazil's information, so you got the same number! Plug this querystring in, and you should see Uraguay's info.

    I'm not sure what the best way is to automatically populate it, maybe just look up the codes you're interested in and save them in a dict? That way every time through the loop you get the correct dest_id.

    In fact, none of the other strings in querystring that you plugged country into (ssne, src, ssne_untouched) even countribute to the end result. You are able to pull up Uraguays info using the 3 fields in my example.