Search code examples
google-apps-scriptgoogle-sheetsgoogle-sheets-formulayahoo-finance

Google Sheets Scraping Options Chain from Yahoo Finance, Incomplete Results


I'm attempting to scrape options pricing data from Yahoo Finance in Google Sheets. Although I'm able to pull the options chain just fine, i.e.

=IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2)

I find that it's returning results that don't completely match what's actually shown on Yahoo Finance. Specifically, the scraped results are incomplete - they're missing some strikes. i.e. the first 5 rows of the chart may match, but then it will start returning only every other strike (aka skipping every other strike).

Why would IMPORTHTML be returning "abbreviated" results, which don't match what's actually shown on the page? And more importantly, is there some way to scrape complete data (i.e. that doesn't skip some portion of the available strikes)?


Solution

  • I believe your goal as follows.

    • You want to retrieve the complete table from the URL of https://finance.yahoo.com/quote/TCOM/options?date=1610668800, and want to put it to the Spreadsheet.

    Issue and workaround:

    I could replicate your issue. When I saw the HTML data, unfortunately, I couldn't find the difference of HTML between the showing rows and the not showing rows. And also, I could confirm that the complete table is included in the HTML data. By the way, when I tested it using =IMPORTXML(A1,"//section[2]//tr"), the same result of IMPORTHTML occurs. So I thought that in this case, IMPORTHTML and IMPORTXML might not be able to retrieve the complete table.

    So, in this answer, as a workaround, I would like to propose to put the complete table parsed using Sheets API. In this case, Google Apps Script is used. By this, I could confirm that the complete table can be retrieved by parsing the HTML table with Sheet API.

    Sample script:

    Please copy and paste the following script to the script editor of Spreadsheet, and please enable Sheets API at Advanced Google services. And, please run the function of myFunction at the script editor. By this, the retrieved table is put to the sheet of sheetName.

    function myFunction() {
      // Please set the following variables.
      const url ="https://finance.yahoo.com/quote/TCOM/options?date=1610668800";
      const sheetName = "Sheet1";  // Please set the destination sheet name.
      const sessionNumber = 2;  // Please set the number of session. In this case, the table of 2nd session is retrieved.
    
      const html = UrlFetchApp.fetch(url).getContentText();
      const section = [...html.matchAll(/<section[\s\S\w]+?<\/section>/g)];
      if (section.length >= sessionNumber) {
        if (section[sessionNumber].length == 1) {
          const table = section[sessionNumber][0].match(/<table[\s\S\w]+?<\/table>/);
          if (table) {
            const ss = SpreadsheetApp.getActiveSpreadsheet();
            const body = {requests: [{pasteData: {html: true, data: table[0], coordinate: {sheetId: ss.getSheetByName(sheetName).getSheetId()}}}]};
            Sheets.Spreadsheets.batchUpdate(body, ss.getId());
          }
        } else {
          throw new Error("No table.");
        }
      } else {
        throw new Error("No table.");
      }
    }
    
    • const sessionNumber = 2; means that 2 of =IMPORTHTML("https://finance.yahoo.com/quote/TCOM/options?date=1610668800","table",2).

    References: