With Python 2.7 I'm scraping with urllib2 and when some Xpath is needed, lxml as well. It's fast, and because I rarely have to navigate around the sites, this combination works well. On occasion though, usually when I reach a page that will only display some valuable data when a short form is filled in and a submit button is clicked (example), the scraping-only approach with urllib2 is not sufficient.
Each time such a page were encountered, I could invoke selenium.webdriver
to refetch the page and do the form-filling and clicking, but this will slow things down considerably.
NOTE: This question is not about the merits or limitations of urllib2, about which I aware there have been many discussions. It's instead focussed only on finding a fast, headless approach to form-filling etc. (one that will also allow for XPath queries if needed).
There are several things you can consider using:
mechanize
robobrowser
selenium
with a headless browser, like PhantomJS
, for example, or using a regular browser, but in a Virtual Display