I have been trying to use YQL to access the fundamentals of listed companies. But what is showing up in the Yahoo finance page is not being returned from YQL queries. Specifically I need to retrieve data from balance sheet and income statement, a sample query for Apple in YQL is like:
SELECT * FROM yahoo.finance.balancesheet WHERE symbol='AAPL'
This, however, only returns a time frame (quarterly), and nothing else.
Link to YQL console of my sample query is here
Is the data inaccessible to YQL or is there anything wrong with the way I am running the query? How can I get a complete list of data as in http://finance.yahoo.com/q/bs?s=AAPL through YQL?
Your query used to work fine. However, a month or two ago, a number of yahoo.finance YQL "tables" stopped working.
IOW, you are doing it right, but YQL is broken.
If you mouse over the yahoo.finance.balancesheet
entry in the left column of the YQL console, buttons labeled desc and src appear. If you click src, it fetches the scraping code for you: http://www.datatables.org/yahoo/finance/yahoo.finance.balancesheet.xml. To make the E4X JavaScript legible, right-click and select View Source or use wget
or curl
from the command line.
Notice that the code fetches http://finance.yahoo.com/q/bs?s=AAPL&quarterly and then uses an XPath query to find the data:
var query = y.xpath(rawresult, "//table[@class='yfnc_tabledata1']/tr/td/table/tr");
If you fetch the page into your browser and inspect the HTML, you find that there is indeed a table
with class yfnc_tabledata1
. However, it has no tr
direct child. Apparently, Yahoo must have decided to add a tbody
element. That probably explains why the query no longer scrapes any data.
The code page lists Ryan Hoium as the author. A little googling leads to the github repository where the code lives, alongside the code for the other Yahoo Finance tables.
Sadly, only the yahoo.finance.sectors
table has received recent attention. The change was to add double slashes to its XPath expression. Double-slashes relax the "direct child" requirement, allowing, for example, tr
to still be found even if there is an intervening tbody
. However, it appears the new version has not been pushed out to the public site.