Search code examples
pythonscreen-scraping

web scraping a problem site


I'm trying to scrape some information from a web site, but am having trouble reading the relevant pages. The pages seem to first send a basic setup, then more detailed info. My download attempts only seem to capture the basic setup. I've tried urllib and mechanize so far.

Firefox and Chrome have no trouble displaying the pages, although I can't see the parts I want when I view page source.

A sample url is https://personal.vanguard.com/us/funds/snapshot?FundId=0542&FundIntExt=INT

I'd like, for example, average maturity and average duration from the lower right of the page. The problem isn't extracting that info from the page, it's downloading the page so that I can extract the info.


Solution

  • The website loads the data via ajax. Firebug shows the ajax calls. For the given page, the data is loaded from https://personal.vanguard.com/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542

    See the corresponding javascript code on the original page:

    <script>populator = new Populator({parentId:
    "profileForm:vanguardFundTabBox:tab0",execOnLoad:true,
     populatorUrl:"/us/JSP/Funds/VGITab/VGIFundOverviewTabContent.jsf?FundIntExt=INT&FundId=0542",
    inline:fals   e,type:"once"});
    </script>