Search code examples
c#web-scrapinghyperlinkrichtextbox

C# can I Scrape a webBrowser control for links?


I'm currently learning C# and its fun so far, but I have hit a roadblock.

I have a program that can scrape a webpage inside the web browser control for information.

So far I can get HTML

HtmlWindow window = webBrowser1.Document.Window;
string str = window.Document.Body.OuterHtml;
richTextBox1.Text = (str.ToString());   

And Text

HtmlWindow window = webBrowser1.Document.Window;
string str = window.Document.Body.OuterText;
richTextBox1.Text = (str.ToString());

I have tried to scrape and display links like this

HtmlWindow window = webBrowser1.Document.Window;
string str = window.Document.Body.GetElementsByTagName("A").ToString();
richTextBox1.Text = str;

But instead, the Rich text box on the form populates with this

System.Windows.Forms.HtmlElementCollection

Do you know how I can get a list of links from the current webpage to show in the textbox?

Thanks Chris.


Solution

  • With the HtmlAgility pack it's easy:

    HtmlWindow window = webBrowser1.Document.Window;
    string str = window.Document.Body.OuterHtml;
    
    HtmlAgilityPack.HtmlDocument HtmlDoc = new HtmlAgilityPack.HtmlDocument();
    HtmlDoc.LoadHtml(str);
    
    HtmlAgilityPack.HtmlNodeCollection Nodes = HtmlDoc.DocumentNode.SelectNodes("//a");
    
    foreach (HtmlAgilityPack.HtmlNode Node in Nodes)
    {
        textBox1.Text += Node.OuterHtml + "\r\n";
    }