JSoup here. I have the following HTML:
<html><head>
<title>My Soup Materials</title>
<!--mstheme--><link rel="stylesheet" type="text/css" href="../../_themes/ice/ice1011.css"><meta name="Microsoft Theme" content="ice 1011, default">
</head>
<body><center><table width="92%"><tbody>
<tr>
<td><h2>My Soup Materials</h2>
<table width="100%%" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td align="left"><b>Origin:</b> Belgium</td>
<td align="left"><b>Count:</b> 2 foos</td>
</tr>
<tr>
<td align="left"><b>Supplier:</b> </td>
<td align="left"><b>Must Burninate:</b> Yes</td>
</tr>
<tr>
<td align="left"><b>Type:</b> Fizzbuzz</td>
<td align="left"><b>Add Afterwards:</b> No</td>
</tr>
</tbody>
</table>
<br>
<b><u>Notes</b></u><br>Drink more ovaltine</td>
</tr>
</tbody>
</table>
</center></body>
</html>
When I run this code:
String htmlString = "<html><head><title>My Soup Materials</title><!--mstheme--><link rel=\"stylesheet\" type=\"text/css\" href=\"../../_themes/ice/ice1011.css\"><meta name=\"Microsoft Theme\" content=\"ice 1011, default\"></head><body><center><table width=\"92%\"><tbody><tr><td><h2>My Soup Materials</h2><table width=\"100%%\" cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td align=\"left\"><b>Origin:</b> Belgium</td><td align=\"left\"><b>Count:</b> 2 foos</td></tr><tr><td align=\"left\"><b>Supplier:</b> </td><td align=\"left\"><b>Must Burninate:</b> Yes</td></tr><tr><td align=\"left\"><b>Type:</b> Fizzbuzz</td><td align=\"left\"><b>Add Afterwards:</b> No</td></tr></tbody></table><br><b><u>Notes</b></u><br>Drink more ovaltine</td></tr></tbody></table></center></body></html>";
Document document = Jsoup.parse(htmlString);
Elements allTables = document.select("table");
Element table = allTables.get(0);
The allTables
element has a size of 0 and is empty (has no children, no properties, etc.). And so when I go to get the first table I get an IndexOutOfBoundsException
. Why? I would have expected it to have lots of children, starting with "<h2>My Soup Materials</h2>
", etc.
This likely comes down to something about your configuration or something different in your Java code than what's in your question. I ran the exact 4 lines of Java code from the question and checked allTables.size()
which returned 2
.
Used Java 17 and the current newest version of JSoup (1.14.3)
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class App {
public static void main(String[] args) {
String htmlString = "<html><head><title>My Soup Materials</title><!--mstheme--><link rel=\"stylesheet\" type=\"text/css\" href=\"../../_themes/ice/ice1011.css\"><meta name=\"Microsoft Theme\" content=\"ice 1011, default\"></head><body><center><table width=\"92%\"><tbody><tr><td><h2>My Soup Materials</h2><table width=\"100%%\" cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td align=\"left\"><b>Origin:</b> Belgium</td><td align=\"left\"><b>Count:</b> 2 foos</td></tr><tr><td align=\"left\"><b>Supplier:</b> </td><td align=\"left\"><b>Must Burninate:</b> Yes</td></tr><tr><td align=\"left\"><b>Type:</b> Fizzbuzz</td><td align=\"left\"><b>Add Afterwards:</b> No</td></tr></tbody></table><br><b><u>Notes</b></u><br>Drink more ovaltine</td></tr></tbody></table></center></body></html>";
Document document = Jsoup.parse(htmlString);
Elements allTables = document.select("table");
Element table = allTables.get(0);
System.out.println(allTables.size()); // prints 2
}
}