Search code examples
javahtmlweb-scrapingproxyjsoup

How to prevent dead timed out while scraping data using JSOUP java?


I learn how to scraping data from a web using jsoup java, in the first try i'm successfully to get the output, but when I try to run again, it gives an error message. Here is my code

package solution;

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class WebScraper {

    public static void main(String[] args) throws IOException {

        Document d=Jsoup.connect("https://www.wikihow.com/wikiHowTo?search=adjust+bass+on+computerr").timeout(6000).get();
        Elements ele=d.select("div#searchresults_list");
        for (Element element : ele.select("div.result")) {
            String img_url=element.select("div.result_title").text();
            System.out.println(img_url);
        }

    }
}

Here are the message error that I got

Exception in thread "main" java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:466)
    at sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:460)
    at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:159)
    at sun.security.ssl.SSLTransport.decode(SSLTransport.java:110)
    at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1198)
    at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1107)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:400)
    at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:372)
    at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:587)
    at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:167)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:732)
    at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:707)
    at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:297)
    at org.jsoup.helper.HttpConnection.get(HttpConnection.java:286)
    at solution.WebScraper.main(WebScraper.java:14)

Process finished with exit code 1

Anyone can help ?

P.S edit:

After solved this issue, there are several solutions approach to this problem such as:

  1. give a higher value in timeout parameter, e.g the time set to 8000 (before 6000)

  2. make sure your internet connection is stable

thanks for everyone who has give advices for this problem


Solution

  • Possibly your internet connection speed is very low. Check your Internet connection.

    Or try the url on the browser. Check how much time it takes to load the html.

    Also, add a try-catch block.