Search code examples
javahtmldomjsoup

Select a particular HTML table with JSOUP


I have my code as:

public static void main(String[] args) throws IOException {

    org.jsoup.nodes.Document doc = Jsoup.connect("https://ms.wikipedia.org/wiki/Malaysia").get();
    org.jsoup.select.Elements rows = doc.select("tr");
    for (org.jsoup.nodes.Element row : rows) {
        org.jsoup.select.Elements columns = row.select("td");
        for (org.jsoup.nodes.Element column : columns) {
            System.out.print(column.text());
        }
        System.out.println();
    }

}

It is printing out all the table rows that on the webpage, is it possible if I just want to print out a selected table in the website?


Solution

  • The best way to do this is grab the table by its title. Since the title is embedded in a cousin element of the table, and CSS has no parent selector, you can use a combination of CSS and Jsoup API calls to achieve this.

    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("https://ms.wikipedia.org/wiki/Malaysia").get();
        Element table = doc.select("span#Trivia").parents().first().nextElementSibling();
        Elements rows = table.select("tr");
        for (Element row : rows) {
            String header = row.select("th").text();
            String value = row.select("td").text();
            System.out.println(header + ": " + value);
        }
    }