For the purpose of my question, I have created a simple HTML page, an extract of which is the following:
<table class="fruit-vegetables">
<thead>
<th>Fruit</th>
<th>Vegetables</th>
</thead>
<tbody>
<tr>
<td>
<b>
<a href="https://en.wikipedia.org/wiki/Apple" title="Apples">Apples</a>
</b>
</td>
<td>
<a href="https://en.wikipedia.org/wiki/Carrot" title="Carrots">Carrots</a>
</td>
</tr>
<tr>
<td>
<i>
<a href="https://en.wikipedia.org/wiki/Orange_%28fruit%29" title="Oranges">Oranges</a>
</i>
</td>
<td>
<a href="https://en.wikipedia.org/wiki/Pea" title="Peas">Peas</a>
</td>
</tr>
</tbody>
</table>
I want to extract the data from the first column called "Fruit" using Jsoup. Thus, the result should be:
Apples
Oranges
I have written a program, an extract of which is the following:
//In reality, it should be connect(html).get().
//Also, suppose that the String `html` has the full source code.
Document doc = Jsoup.parse(html);
Elements table = doc.select("table.fruit-vegetables").select("tbody").select("tr").select("td").select("a");
for(Element element : table){
System.out.println(element.text());
}
The result of this program is:
Apples
Carrots
Oranges
Peas
I know that something is not working good, but I can't find my mistake. All the other questions here in Stack Overflow did not solve my problem. What do I have to do?
You seems to be looking for
Elements el = doc.select("table.fruit-vegetables td:eq(0)");
for (Element e : el){
System.out.println(e.text());
}
From http://jsoup.org/cookbook/extracting-data/selector-syntax you can find description of :eq(n)
as
:eq(n)
: find elements whose sibling index is equal ton
; e.g.form input:eq(1)
So with td:eq(0)
we are selecting each <td>
which is first child of its parent - in this case <tr>
.