Search code examples
javascriptpythonbrowserscreen-scraping

scrape html generated by javascript with python


I need to scrape a site with python. I obtain the source html code with the urlib module, but I need to scrape also some html code that is generated by a javascript function (which is included in the html source). What this functions does "in" the site is that when you press a button it outputs some html code. How can I "press" this button with python code? Can scrapy help me? I captured the POST request with firebug but when I try to pass it on the url I get a 403 error. Any suggestions?


Solution

  • In Python, I think Selenium 1.0 is the way to go. It’s a library that allows you to control a real web browser from your language of choice.

    You need to have the web browser in question installed on the machine your script runs on, but it looks like the most reliable way to programmatically interrogate websites that use a lot of JavaScript.