I only need to count the number of child tr, not the number of grandson tr, but my current count is 8, and the result I want to get is 2. I am a newer, how to solve this problem?
from lxml import etree
html_string = '''
<!DOCTYPE html>
<html lang="en">
<head>
<title>title</title>
</head>
<body>
<div class="books">
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr> // want to count
<td><p class="en">name:</p>
</td>
<td>
<table width="780" cellspacing="0" cellpadding="0" border="0" class="noComma">
<tbody>
<tr>……</tr>
<tr>……</tr>
<tr>……</tr>
</tbody>
</table>
</td>
</tr>
<tr> // want to count
<td style="width: 200px" class="left_title">
<p class="en">name:</p>
</td>
<td>
<table width="780" cellspacing="0" cellpadding="0" border="0" class="noComma">
<tbody>
<tr>……</tr>
<tr>……</tr>
<tr>……</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
'''
html =etree.HTML(html_string)
trs = html.xpath('//tr')
print(len(trs))
My current count is 8, and the result I want to get is 2.
Use :
trs = html.xpath('//tr[not(ancestor::td)]')
That will give only those tr
's that don't have a ancestor
td
Or be more explicit:
//div[@class='books']/table/tbody/tr