Search code examples
c#internet-explorerhtml-agility-packmshtmlmicrosoft.mshtml

Use explorer.document as source HtmlDocument for HtmlAgilityPack


I want to use currently loaded webpage in internet explorer as HtmlDocument in HtmlAgilityPack. I am using explorer document through mshtml as COM object.

mshtml.HTMLDocument doc = explorer.Document as mshtml.HTMLDocument;

Then I've tried to convert it to HtmlDocument which is using in HtmlAgilityPack

HtmlAgilityPack.HtmlDocument hdoc = (HtmlAgilityPack.HtmlDocument)doc;

But it's not working due to invalid cast operation. Exception message is shown below.

Exception Message

Anyhow I want to use currently loaded webpage as source to htmlagilitypack, I know that I can use HtmlWeb provided by htmlagility pack and load current url but I want to highlight elements which are in the loaded page (elements found using htmlagilitypack) I guess it cannot be done through that kind of implementation. Any ideas to implement this any support will be great. thanks.


Solution

  • Of course you can't cast between mshtml.HTMLDocument and HtmlAgilityPack.HtmlDocument, they're completely distinct classes from different libraries, where one is purely managed and the other is a managed COM wrapper.

    What you can do is grab the HTML from the mshtml.HTMLDocument and load it into the Agility Pack.

    Probably something along these lines:

      mshtml.IHTMLDocument3 sourceDoc = (mshtml.IHTMLDocument3) explorer.Document;  
      string documentContents = sourceDoc.documentElement.outerHTML; 
    
      HtmlAgilityPack.HtmlDocument targetDoc = new HtmlAgilityPack.HtmlDocument();
    
      targetDoc.LoadHtml(documentContents);
    

    You could also use the IPersistStream and then call the Save method, pass a MemoryStream and then feed that to the HtmlAgilityPack.