Search code examples
c#htmlxpathhtml-agility-pack

C# and Html Agility Pack


I have multiple files, from which I have to extract tables containing data. Problem is tables don't have IDs, so I have to search based on the content (which is constant in each file). There are multiple tables in each file and the table of interest doesn't have constant XPath.

<table border="0" cellspacing="0" cellpadding="0" style="BORDER-COLLAPSE: collapse" bordercolor="#111111">
    <tbody>
        <tr> 
            <td class="s">CONSTANT_TEXT</td>
            <td class="l">CHANGING_VALUE</td>
        </tr>

        <tr> 
            <td class="s"> </td>
            <td class="l"><a style="" id="CONSTANT_ID" href="mailto: XXXX</a>
 </td>
        </tr>
    </tbody>

</table>

How do I: 1. Search based on the CONSTANT_TEXT CONSTANT_TEXT , return the value of 2nd TD CHANGING_VALUE , without knowing the Path (it doesn't have ID and it's position changes from file to file). 2. Search based on CONSTANT_TEXT CONSTANT_TEXT , return the Parent table of that TD

What I did is to search and return CONSTANT_TEXT , with Html Agility Pack, then iterate the XPath upwards until the Table is reached.

var output= document.DocumentNode.SelectNodes("//a[@id='CONSTANT_ID']");
output[0].XPath ="/html[1]/body[1]/table[1]/thead[1]/tr[1]/td[1]/table[1]/tbody[1]/tr[2]/td[2]/a[1]"

My plan was to iterate each output and get the XPath for lowest table occurring, table[1], then extract the data.

Thanks, Mike


Solution

  • Strictly speaking, you'll need the following XPath :

    Search based on the CONSTANT_TEXT CONSTANT_TEXT , return the value of 2nd TD CHANGING_VALUE

    //td[.="CONSTANT_TEXT"]/following-sibling::td[1]/text()
    

    Output : CHANGING_VALUE

    Search based on CONSTANT_TEXT CONSTANT_TEXT , return the Parent table of that TD

    //td[.="CONSTANT_TEXT"]/ancestor::table[1]
    

    Output : <table> element