Search code examples
c#htmlmshtmlweb-crawler

How to get HTML element coordinates using C#?


I am planning to develop web crawler, which would extract coordinates of html elements from web pages. I have found out that it is possible to get html element coordinates by using "mshtml" assembly. Right now I would like to know if it is possible and how to get only necessary information (html,css) from web page, and then by using appropriate mshtml classes get correct coordinates of all html elements?

Thank you!


Solution

  • I use these c# functions to determine element positions. You need to pass in a reference to the HTML element in question.

    public static int findPosX( mshtml.IHTMLElement obj ) 
    {
      int curleft = 0;
      if (obj.offsetParent != null ) 
      {
        while (obj.offsetParent != null ) 
        {
          curleft += obj.offsetLeft;
          obj = obj.offsetParent;
        }
      } 
    
      return curleft;
    }
    
    public static int findPosY( mshtml.IHTMLElement obj ) 
    {
      int curtop = 0;
      if (obj.offsetParent != null ) 
      {
        while (obj.offsetParent != null ) 
        {
          curtop += obj.offsetTop;
          obj = obj.offsetParent;
        }
      } 
    
      return curtop;
    }
    

    I get HTML elements from the current document like so:

    // start an instance of IE
    public SHDocVw.InternetExplorerClass ie;
    ie = new SHDocVw.InternetExplorerClass();
    ie.Visible = true;
    
    // Load a url
    Object Flags = null, TargetFrameName = null, PostData = null, Headers = null;
    ie.Navigate( url, ref Flags, ref TargetFrameName, ref PostData, ref Headers );
    
    while( ie.Busy )
    {
      Thread.Sleep( 500 );
    }
    
    // get an element from the loaded document
    mshtml.HTMLDocumentClass document = ((mshtml.HTMLDocumentClass)ie.Document);
    document.getElementById("myelementsid");