Search code examples
asp.nethtmlregexpage-titleasp.net-webpages

How to extract the title image of the web page


I want to extract the title image of a web page using C# in ASP.NET. I checked the windows and document objects but they don't have a property such as title. So searching for method to extract the title image like in the page tab of the Chrome.


Solution

  • using (WebClient client = new WebClient())
    {
      Byte[] favico = client.DownloadData("http://msite.com/favico.ico");
    }
    

    That's using WebClient.DownloadData. You can also use WebClient.DownloadFile if you're looking to store it.

    A further bullet-proofed approach would be to download the index page and use an HTML parser to look for the <link> tag that specifies where the icon is supposed to be (could also be applied to apple-touch-icon or otherwise).

    BTW, the tags I believe you're looking to parse are:

    <!-- StackOverflow's implementation: -->
    <link rel="shortcut icon" href="http://cdn.../favicon.ico">
    <link rel="apple-touch-icon" href="http://cdn.../apple-touch-icon.png">
    
    <!-- Google's implementation: -->
    <meta content="/images/google_favicon_128.png" itemprop="image">
    
    <!-- Facebook's implementation: -->
    <link href="http://static.ak.fbcdn.net/.../q9U99v3_saj.ico" rel="shortcut icon">