Search code examples
c#web-scrapinghtml-agility-pack

How do I use C# HtmlAgilityPack to get out the following text from a website?


I need to get out the text "223M" from the structure below using HtmlAgilityPack in C#, how do I do it?

<div data-widget="quote-options-flow-summary" data-mode="desktop" data-symbol="AAPL">
    <div class="options-flow-summary">
        <div data-app="summary-element" data-id="overall_flow">
            <h2 class="summary-type">
                Overall Flow
                <i class="fas fa-info-circle" title="Determined based on positive or negative net premiums.">
                </i>
            </h2>
            <h3 data-app="summary-value" style="color:#009c3d">
                Bullish
            </h3>
        </div>
        <div data-app="summary-element" data-id="net_premium">
            <h2 class="summary-type">
                Net Premium
                <i class="fas fa-info-circle" title="Calculated value of calls premium minus put premium in the filtered Flow.">
                </i>
            </h2>
            <h3 data-app="summary-value" style="color:#009c3d">
                223M
            </h3>
        </div>
    </div>
</div>

I am new to HtmlAgilityPack so i have no idea how do I do anything with it, especially the data-app="summary-value" in the h3 threw me off as i have no idea if that is what i should refer to when i grab the text "223M"


Solution

  • Along the lines of:

    var htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(html);
    
    string premium = htmlDoc.DocumentNode
        .SelectSingleNode("//div[@data-id='net_premium']").SelectSingleNode("h3[@data-app='summary-value']").InnerText;
    

    The string in the functions are XPath expressions

    https://html-agility-pack.net/select-single-node

    http://dotnetfiddle.net/pXItfm