I have multiple files, from which I have to extract tables containing data. Problem is tables don't have IDs, so I have to search based on the content (which is constant in each file). There are multiple tables in each file and the table of interest doesn't have constant XPath.
<table border="0" cellspacing="0" cellpadding="0" style="BORDER-COLLAPSE: collapse" bordercolor="#111111">
<tbody>
<tr>
<td class="s">CONSTANT_TEXT</td>
<td class="l">CHANGING_VALUE</td>
</tr>
<tr>
<td class="s"> </td>
<td class="l"><a style="" id="CONSTANT_ID" href="mailto: XXXX</a>
</td>
</tr>
</tbody>
</table>
How do I: 1. Search based on the CONSTANT_TEXT CONSTANT_TEXT , return the value of 2nd TD CHANGING_VALUE , without knowing the Path (it doesn't have ID and it's position changes from file to file). 2. Search based on CONSTANT_TEXT CONSTANT_TEXT , return the Parent table of that TD
What I did is to search and return CONSTANT_TEXT , with Html Agility Pack, then iterate the XPath upwards until the Table is reached.
var output= document.DocumentNode.SelectNodes("//a[@id='CONSTANT_ID']");
output[0].XPath ="/html[1]/body[1]/table[1]/thead[1]/tr[1]/td[1]/table[1]/tbody[1]/tr[2]/td[2]/a[1]"
My plan was to iterate each output and get the XPath for lowest table occurring, table[1], then extract the data.
Thanks, Mike
Strictly speaking, you'll need the following XPath :
Search based on the CONSTANT_TEXT CONSTANT_TEXT , return the value of 2nd TD CHANGING_VALUE
//td[.="CONSTANT_TEXT"]/following-sibling::td[1]/text()
Output : CHANGING_VALUE
Search based on CONSTANT_TEXT CONSTANT_TEXT , return the Parent table of that TD
//td[.="CONSTANT_TEXT"]/ancestor::table[1]
Output : <table> element