python python-2.7 web-scraping screen-scraping urllib2

Alternatives to Selenium/Webdriver for filling in fields when scraping headlessly with Python?

With Python 2.7 I'm scraping with urllib2 and when some Xpath is needed, lxml as well. It's fast, and because I rarely have to navigate around the sites, this combination works well. On occasion though, usually when I reach a page that will only display some valuable data when a short form is filled in and a submit button is clicked (example), the scraping-only approach with urllib2 is not sufficient.

Each time such a page were encountered, I could invoke selenium.webdriver to refetch the page and do the form-filling and clicking, but this will slow things down considerably.

NOTE: This question is not about the merits or limitations of urllib2, about which I aware there have been many discussions. It's instead focussed only on finding a fast, headless approach to form-filling etc. (one that will also allow for XPath queries if needed).

Solution

There are several things you can consider using:

mechanize
robobrowser
selenium with a headless browser, like PhantomJS, for example, or using a regular browser, but in a Virtual Display