Search code examples
pythonweb-scrapingweb-crawlerurllib2

How to wait for the page to load before scraping it?


I want to extract the HTML from a webpage:

import urllib2
req = urllib2.Request('https://www.example.com')
response = urllib2.urlopen(req)
fullhtml = response.read()

I tried with "ulrllib2" but since the page is built dynamically, the HTML content is empty.

Is there a way to wait for the javascript to load?


Solution

  • Take a look at this http://phantomjs.org/ . Most websites are javascript based and php or python can not execute them. I think this library will be the best you can get.