Search code examples
c#html-agility-packselectnodes

How can I get this text from h4?


(Sorry about my english, I'm brazilian)

I'm trying to get the InnerText from a h4 tag using the HtmlAgilityPack, I managed to get that type of value in 3 of 4 tags in the web site that I need. But the last one is the most important and it just returns an empty value.

Is it possible, that the structure of how the website was build requires a different way to get this value?

This is the specific h4 that I'm trying to extract InnetText ("356.386.496,02"):

<h4 class="text-black--opacity-60 fs-20 fs-sm-42 fs-lg-40 w-100 mt-3">
<span class="align-middle fs-12 fs-lg-12 pr-4">R$</span>
"356.386.496,02"
</h4>

I've tried this:

HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(data);

var nodes = htmlDocument.DocumentNode.SelectNodes("//h4[@class='text-black--opacity-60 fs-20 fs-sm-42 fs-lg-40 w-100 mt-3']");

foreach (var node in nodes)
{
    Console.WriteLine(node.InnerText);
}
//Result in console:
//=> 

Note that the SelectNodes method doesn't return null, it find the h4 node perfectly, but the InnerText value is "".


Solution

  • try to replace "356.386.496,02" with 356.386.496,02 or with ""356.386.496,02""
    this solution should be work

    public static void Main()
        {
            var html = 
            @"<h4 class=""text-black--opacity-60 fs-20 fs-sm-42 fs-lg-40 w-100 mt-3"">
    <span class=""align-middle fs-12 fs-lg-12 pr-4"">R$</span>
    ""56.386.496,02""
    </h4>";
    
            var htmlDoc = new HtmlDocument();
            htmlDoc.LoadHtml(html);
    
            var htmlNodes = htmlDoc.DocumentNode.SelectNodes("//h4[@class='text-black--opacity-60 fs-20 fs-sm-42 fs-lg-40 w-100 mt-3']");
    
            foreach (var node in htmlNodes)
            {
    
                Console.WriteLine(node.InnerText);
            }
        }