Search code examples
javascriptpythonhtmlscreen-scraping

state of HTML after onload javascript


many webpages use onload JavaScript to manipulate their DOM. Is there a way I can automate accessing the state of the HTML after these JavaScript operations?

A took like wget is not useful here because it just downloads the original source. Is there perhaps a way to use a web browser rendering engine?

Ideally I am after a solution that I can interface with from Python.

thanks!


Solution

  • The only good way I know to do such things is to automate a browser, for example via Selenium RC. If you have no idea of how to deduce that the page has finished running the relevant javascript, then, just a real live user visiting that page, you'll just have to wait a while, grab a snapshot, wait some more, grab another, and check there was no change between them to convince yourself that it's really finished.