Writing some additional classes for an existing GWT project. I need to:
The returned page is in very simple HTML, therefore parsing it shouldn't be very difficult, I just need to get the data first.
How do I do this in Java? What packages am I best looking at?
With native Java API, the easiest way to read from an URL is using java.net.URL#openStream()
. Here's a basic example:
try (InputStream response = new URL("https://www.stackoverflow.com").openStream()) {
String body = new String(input.readAllBytes(), StandardCharsets.UTF_8);
System.out.println(body);
}
You could feed the InputStream
to any DOM/SAX parser of your taste. The average parser can take (in)directly an InputStream
as argument or even a URL. Jsoup is one of the better HTML parsers.
In case you want a bit more control and/or want a more self-documenting API, then you can since Java 11 use the java.net.http.HttpClient
. It only gets verbose quickly when you merely want the response body:
HttpClient client = HttpClient.newBuilder().build();
HttpRequest request = HttpRequest.newBuilder().GET().uri(URI.create("https://stackoverflow.com")).build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
String body = response.body();
System.out.println(body);