Search code examples

How do I read HTML Document in C# given that I have the webpage source stored in a string variable?

I have tried to do this on my own but couldn't.

I have an html document, and I'm trying to extract the addresses for all the pictures in it into a c# collection and I'm not sure of the syntax. I'm using HTMLAgilityPack... Here is what I have so far. Please advise.

The HTML Code is the following:

<div style='padding-left:12px;' id='myWeb123'>
<b>MyWebSite Pics</b>
<br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<img src="" alt='myWebSitePics' title='myWebSitePics' /><br /><br />
<a href="" target="_blank" rel="nofollow">Source</a>

And the c# code is the following:

HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();


// Targets a specific node
HtmlNode someNode = document.GetElementbyId("myWeb123");

//HtmlNodeCollection linkNodes = document.DocumentNode.SelectNodes("//a[@href]");

HtmlNodeCollection linkNodes = document.DocumentNode.SelectNodes("//div[@id='myWeb123']");

if (linkNodes != null)
    int count = 0;
    foreach(HtmlNode linkNode in linkNodes)

        string linkTitle = linkNode.GetAttributeValue("src", string.Empty);

        Debug.Print("linkTitle = " + linkTitle);

        if (linkTitle == string.Empty)
            HtmlNode imageNode = linkNode.SelectSingleNode("img[@alt]");
            if (imageNode != null)
                Debug.Print("imageNode = " + imageNode.Attributes.ToString());
        Debug.Print("count = " + count);

I tried to use the HtmlAgilityPack Documentation but this pack lacks examples and the information about its methods and classes are really hard for me to understand without examples.


  • try this, sorry if it will not be buildable, I have overwritten our code to your situation

    List<string> result = new List<string>();
    foreach (HtmlNode link in document.DocumentNode.SelectNodes("//img[@src]"))
        HtmlAttribute att = link.Attributes["src"];
        string temp = att.Value;
        string urlValue;
            urlValue = temp;
            temp = HttpUtility.UrlDecode(HttpUtility.HtmlDecode(urlValue));
        } while (temp != urlValue);