Search code examples
pythonapidata-mininggoogle-finance

Obtaining financial data from Google Finance which is outside the scope of the API


Google's finance API is incomplete -- many of the figures on a page such as:

http://www.google.com/finance?fstype=ii&q=NYSE:GE

are not available via the API.

I need this data to rank companies on Canadian stock exchanges according to the formula of Greenblatt, available via google search for "greenblatt index scans".

My question: what is the most intelligent/clean/efficient way of accessing and processing the data on these webpages. Is the tedious approach really necessary in this case, and if so, what is the best way of going about it? I'm currently learning Python for projects related to this one.


Solution

  • You could try asking Google to provide the missing APIs. Otherwise, you're stuck with screen scraping, which is never fun, prone to breaking without notice, and likely in violation of Google's terms of service.

    But, if you still want to write a screen scraper, it's hard to beat a combination of mechanize and BeautifulSoup. BeautifulSoup is an HTML parser and mechanize is a Python-based web browser that will let you log in, store cookies, and generally navigate around like any other web browser.