Search code examples
javahtmlparsingweb-scrapingjsoup

Parsing html tables using jsoup


I am parsing tables using jsoup. I need to connect to division standing tables from this website: https://www.basketball-reference.com/leagues/NBA_2006.html. Don't know how to parse tables because I need to use the same method for every division standing table, but the id is different for older seasons (e.g. id="divs_standings_W", "id="divs_standings_E" and "id="divs_standings_"). Link to some older season: https://www.basketball-reference.com/leagues/NBA_1950.html.

How can I check if the table with the given id exists and if it exists put it in a variable table? Don't have much relevant code.

Document doc = Jsoup.connect("https://www.basketball-reference.com/leagues/NBA_1950.html").get();
Elements table = doc.select("table[id=\"divs_standings_\"]");

Solution

  • You can just use prefix matching. Use table[id^="divs_standings_"]. This will match all tables, with ids starting with divs_standings_:

    Document doc = Jsoup.connect("https://www.basketball-reference.com/leagues/NBA_1950.html").get();
    Element table = doc.selectFirst("table[id^=\"divs_standings_\"]");
    

    This will work for old and new seasons.

    To wrap this in a method you can just use something like this:

    private static void processTable(String url) throws IOException {
        Document doc = Jsoup.connect(url).get();
        Element table = doc.selectFirst("table[id^=\"divs_standings_\"]");
        System.out.println(table);
    }
    

    and call it with both urls:

    processTable("https://www.basketball-reference.com/leagues/NBA_1950.html");
    processTable("https://www.basketball-reference.com/leagues/NBA_2006.html");
    

    You also can use pattern matching if you have more complex ids. Check out the link above for this.