Search code examples
pythonseleniumpyqt4beautifulsoupscreen-scraping

Scraping Ajax using Python


I am trying to get the data in the table at this website which is updated via jquery after the page loads (I have permission) :

http://whichchart.com/

I currently use selenium and beautifulsoup to get data, however because this data is not visible in the html source, I can't access it. I have tried PyQt4 but it likewise does not get the updated html source.

The values are visible in firebug and chrome developer, so are there any python packages out there which can exploit this and feed it to beautifulsoup?

I'm not a massive techie so ideally I would like a solution which would work in Python or the next easiest software type.

I'm aware I can get it via proprietary "screen-scraper" software, but that is expensive.


Solution

  • Page is making AJAX call to get a data to http://whichchart.com/service.php?action=NewcastleCoal which returns values in JSON. So you can do the following:

    • Use urllib to get data using HTTP
    • Parse that data with json library reads method
    • Now you have a python object to process

    If you need to process HTML page content I would suggest to use library like BeautifulSoup or scrapy