Search code examples
javahtmljsouphtml-parsing

Java jsoup connection issue


I am trying to get information on a stock but it is not working (i will eventually put the ticker as a input for a function) I am trying to get the earnings per share and the price to earnings ratio but i keep getting this error. How would i fix this? At the very least i just need jsoup to be able to access the html code. Essentially I want the code to output 15.62 from the P/E ratio.

This my code:

import org.jsoup.*;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

class Main {
    static Document document;
    public static void main(String[] args) throws java.io.IOException{

   Document doc = Jsoup.connect("https://www.nasdaq.com/symbol/aapl").get();
   Elements elements = doc.select("div#table-table fontS14px");
        System.out.println(elements.get(1).getAllElements().get(0).toString());
    }
}

This is the error message:

Exception in thread "main" java.net.SocketTimeoutException: Read timed out
    at java.base/java.net.SocketInputStream.socketRead0(Native Method)
    at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.base/java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.base/java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at java.base/sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:746)
    at java.base/sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:689)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1604)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1509)
    at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:527)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:750)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:722)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:306)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:295)

this is the html that i am trying to read:

<div class="table-table fontS14px">

                <div class="table-row" style="">
                    <div class="table-cell">
                        <b>P/E Ratio</b>
                    </div>
                    <div class="table-cell">
                        17.23
                    </div>
                </div>

                <div class="table-row" style="">
                    <div class="table-cell">
                        <b>Forward P/E (1y)</b>
                    </div>
                    <div class="table-cell">
                        15.62
                    </div>
                </div>

                <div class="table-row" style="">
                    <div class="table-cell">
                        <b>Earnings Per Share (EPS)</b>
                    </div>
                    <div class="table-cell">
                        $&nbsp;11.87
                    </div>
                </div>

                <div class="table-row">
                    <div class="table-cell">
                        <b>Annualized Dividend</b>
                    </div>
                    <div class="table-cell">
                        $ 2.92
                    </div>
                </div>
                <div class="table-row">
                    <div class="table-cell">
                        <b>Ex Dividend Date</b>
                    </div>
                    <div class="table-cell">
                        11/8/2018
                    </div>
                </div>
                <div class="table-row">
                    <div class="table-cell">
                        <b>Dividend Payment Date</b>
                    </div>
                    <div class="table-cell">
                        11/15/2018
                    </div>
                </div>
                <div class="table-row">
                    <div class="table-cell">
                        <b>Current Yield</b>
                    </div>
                    <div class="table-cell">
                        1.39 %
                    </div>
                </div>
                <div class="table-row" style="">
                    <div class="table-cell">
                        <b>Beta</b>
                    </div>
                    <div class="table-cell">
                        1.02
                    </div>
                </div>
            </div>

Solution

  • The link does not work because before accessing it redirects you a lot of times, and JSOUP just takes the static context, that's why you are getting that error code every time.

    The problem is with Nasqad in particular, so if you want to retrieve stock information for any ticker I'll highly recommend you to crawl Yahoo Finance, because it works better, and if you just want the info, there's a lot of wrappers like yahoo-finance-fix for Python or Java Finance Quotes for Java.

    I have a Nasqad ETF crawler made using Java but on a private repository on GitHub, if you need it ask me for it and I'll invite you to the repo or something!

    Hope it helped you! Feel free to ask for anything else!