Search code examples
javahtmlhttpjsouphttpclient

Use data retrieved from HTTPClient into JSoup


I am using HTTPClient to connect to a website.The following snippet of code is used for this purpose:

 byte[] responseBody = method.getResponseBody();
 System.out.println(new String(responseBody));

The above code displays the html code of website. Further I wanted to access only some data from the code which I was able to access using JSoup using following code snippet:

Document doc = Jsoup.connect(url).get();

In the above code I have directly specified url of website using "url". which means I do not require HTTPClient if I use JSoup. Is there a way I can use " responseBody" retrieved using HTTPClient to be integrated in JSoup code so that I do not have to use Document doc = Jsoup.connect(url).get();

Thanks


Solution

  • You can parse the HTML directly through Jsoup#parse:

    Document doc =  Jsoup.parse(new String(responseBody));
    

    Though I have my concerns of converting byte array to String directly, in your case however it should work fine.

    The other way, I can use URLConnection and get a handle on the InputStream and parse it to a String with the provided charset encoding:

    URLConnection connection = new URL("http://www.stackoverflow.com").openConnection();
            InputStream inStream = connection.getInputStream();
            String htmlText = org.apache.commons.io.IOUtils.toString(inStream, connection.getContentEncoding());
    
            Document document = Jsoup.parse(htmlText);
            Elements els = document.select("tbody > tr > td");
    
            for (Element el : els) {
                System.out.println(el.text());
            }
    

    Would give:

    Stack Overflow Server Fault Super User Web Applications Ask Ubuntu Webmasters Game Development TeX - LaTeX
    Programmers Unix & Linux Ask Different (Apple) WordPress Answers Geographic Information Systems Electrical Engineering Android Enthusiasts Information Security
    Database Administrators Drupal Answers SharePoint User Experience Mathematica more (14)
    ...