Search code examples
javascriptjavaeclipsenashorn

Get InnerHTMLof JS from Website using Java


Goal: Get inner text of JavaScript element from Yahoo Finance Page. Please refer to

enter image description here

I can get the innerHTML using the the code below

document.getElementsByClassName('D(ib) Va(t)')[15].childNodes[2].innerHTML

But, I can't find a method to communicate this to the Yahoo Finance page in Java

I've briefly tried the following APIs:

  • JSoup
  • HTMLUnit
  • Nashorn

I think Nashorn can get the text I'm looking for, but I haven't been able to do it yet.

If anyone has done something similar or can point me in the right direction, that would be much appreciated.

Let me know if more details are needed.


Solution

  • HtmlUnit seems to have problems with this site, since the response is incomplete as well. You could use PhantomJS. Just download the binary for your OS and create a script file (see API).

    Script (yahoo.js):

    var page = require('webpage').create();
    var fs = require('fs');
    
    page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';
    page.settings.resourceTimeout = '5000';
    
    
    
    page.open('http://finance.yahoo.com/quote/AAPL/profile?p=AAPL', function(status) {
      console.log("Status: " + status);
      if(status === "success") {
    
        var path = 'yahoo.html';
        fs.write(path, page.content, 'w');
      }
      phantom.exit();
    });
    

    Java code:

    try {
        //change path to phantomjs binary and your script file
        String phantomJSPath = "bin" + File.separator + "phantomjs";
        String scriptFile = "yahoo.js";
    
        Process process = Runtime.getRuntime().exec(phantomJSPath + " " + scriptFile);
        process.waitFor();
    
        //Jsoup
        Elements elements = Jsoup.parse(new File("yahoo.html"),"UTF-8").select("div.asset-profile-container p strong"); //yahoo.html created by script file in same path
    
        for (Element element : elements) {
            if(element.attr("data-reactid").contains("asset-profile.1.1.1.2")){
                System.out.println(element.text());
            }
        }
    
    } catch (Exception e) {
        e.printStackTrace();
    }
    

    Output:

    Consumer Goods
    

    Note: The following link returns a JSONObject containing the company information, not sure though if the crumb parameter changes or is constant for a company: https://query2.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&crumb=hm4%2FV0JtzlL&lang=en-US&region=US&modules=assetProfile%2CsecFilings%2CcalendarEvents&corsDomain=finance.yahoo.com