Search code examples
wpfvb.netmshtml

Using MSHTML in VB.Net to parse HTML


Was wondering if someone could give me some direction on this. I've spent a decent amount of time on it and don't seem to be getting anywhere:

I have a hidden field that I'm trying to parse out of an HTML document in VB.Net. I'm using a System.Windows.Controls.WebBrowser control in a WPF application and handling the LoadCompleted event. Inside the LoadCompleted event handler I do something like this:

Dim htmlDocument As mshtml.IHTMLDocument2 = Me.WebBrowser.Document
Dim allElements As mshtml.IHTMLElementCollection = htmlDocument.body.all
Dim hiddenField As mshtml.IHTMLInputElement = allElements.tags("hidField")

The hidden field that I'm trying to access is declared in my .aspx file as such:

<asp:HiddenField runat="server" ID="hidField"/>

The problem is that this allElements.tags("hidField") is returning null. Am I doing something wrong with the mshtml library? I don't have much experience with it and gathered that I needed to do something like this to find my hidden field element. Let me know if you need more info. Thanks for the help in advance.

EDIT
Here's is my final working solution for anyone interested:

    Dim htmlDocument As mshtml.IHTMLDocument2 = Me.WebBrowser.Document
    Dim allElements As mshtml.IHTMLElementCollection = htmlDocument.body.all
    Dim allInputs As mshtml.IHTMLElementCollection = allElements.tags("input")

    For Each element As mshtml.IHTMLInputElement In allInputs
        If element.type = "hidden" And element.name.Contains("hidField") Then
            MessageBox.Show(element.value)
        End If
    Next

Solution

  • You need to look for the rendered tag, not the serverside value.

    This will be rendered as an <input type="hidden">, so you need to use allElements.tags("input"), then find the specific hidden one. The id attribute may not end up as hidField - it depends on what container controls it is in and in what nesting level.

    I suggest using the HTML Agilty Pack to parse the HTML and find the element instead - it should be easier to use than MSHTML.