Search code examples
rubyweb-scrapingmechanize

Mechanize submit result is not the correct page


I was trying to scrape booking.com as an exercise to learn Mechanize, but I can't get past an issue. I am trying to get a hotel's prices trough Mechanize using the following code:

hotel_name = "Hilton New York"
date = Date.today
day_after_date = date + 1
agent = Mechanize.new

homepage = agent.get("http://www.booking.com")
# Fill out the main form on the booking.com homepage
main_form = homepage.form_with(name: 'frm')
main_form.ss = hotel_name
main_form.checkin_monthday = date.day.to_s
main_form.checkin_year_month = "#{date.year}-#{date.month}"
main_form.checkout_monthday = day_after_date.day.to_s
main_form.checkout_year_month = "#{day_after_date.year}-#{day_after_date.month}"
main_form[''] = 1 # 1 adult, 0 children

homepage.save('1-homepage.html') # For debugging purposes

# Choose the hotel from the list that comes up
hotel_selection_page = agent.submit main_form
hotel_link = hotel_selection_page.links.select { |link| link.text =~ /#{hotel_name}/i }.first
hotel_page = hotel_link.click

# For debugging purposes
hotel_selection_page.save('2-hotels-list.html')
hotel_page.save('3-hotel-page.html')

If you follow the pages through your web browser, you will see that, after submitting the form on the homepage and choosing the hotel on the next page, you see the room prices for the selected date.

Through Mechanize though, on the 3-hotel-page.html page, you cannot see the prices.

I have been at this for a while, and I can't seem to solve it. I thought the problem was the JavaScript that booking.com is using, but even after turning off JavaScript on my web browser, I was able to get the correct behavior.

Any thoughts on this?

Edit: I just realized that when the form is sent through the web browser, on the second page where you choose the hotel, hotel links have a sid parameter (for example, sid=ba232d9d340c66ae73f1ded22b80a0da), but when I send the form through Mechanize, I don't get the sid parameter. What could be the reason?


Solution

  • Adding the following line to change the user agent worked in the end:

    agent.user_agent_alias = 'Mac Safari'