Search code examples
c#htmlhttp-status-code-403favicon

How to get the favicon from a 403 page


I am writing a tool that allows the user to input a URL, to which the program responds by attempting to show that website's favicon. I have this working for many sites but one site that is giving me trouble is my self-hosted Trac site. It seems that Trac's normal behaviour, until the end user is autenticated, is to show a custom 403 page (Forbidden), inviting the user to log in. Accessing Trac from a web browser, the favicon displays in the browser's tab, even though I'm not logged in (and Firebug, for instance, shows a 403 for the page content). If I view source from the browser, the favicon's location is right there in the source. However, from my application, requesting the Trac website with request.GetResponse() throws a WebException containing a 403, giving me no opportunity to read the response stream that contains the vital information required to find the favicon.

I already have code to download a website's HTML and extract the location of its favicon. What I am stuck with is downloading a site's HTML even when it responds with a 403.

I played with various UserAgent, Accept and AcceptLanguage properties of the HttpWebRequest object but it didn't help. I also tried following any redirects myself as I read somewhere that .NET doesn't do them well. Still no luck.

Here's what I have:

public static MemoryStream DownloadHtml(
        string urlParam, 
        int timeoutMs = DefaultHttpRequestTimeoutMs, 
        string userAgent = "", 
        bool silent = false
)
{
    MemoryStream result = null;

    HttpWebRequest request = null;
    HttpWebResponse response = null;

    try
    {
        Func<string, HttpWebRequest> createRequest = (urlForFunc) =>
        {
            var requestForAction = (HttpWebRequest)HttpWebRequest.Create(urlForFunc);

            // This step is now required by Wikipedia (and others?) to prevent periodic or 
            // even constant 403's (Forbidden).
            requestForAction.UserAgent = userAgent;

            requestForAction.Accept = "text/html";
            requestForAction.AllowAutoRedirect = false;
            requestForAction.Timeout = timeoutMs;

            return requestForAction;
        };

        string urlFromResponse = "";
        string urlForRequest = "";

        do
        {
            if(response == null)
            {
                urlForRequest = urlParam;
            }
            else
            {
                urlForRequest = urlFromResponse;

                response.Close();
            }

            request = createRequest(urlForRequest);
            response = (HttpWebResponse)request.GetResponse();

            urlFromResponse = response.Headers[HttpResponseHeader.Location];
        }
        while(urlFromResponse != null 
                && urlFromResponse.Length > 0 
                && urlFromResponse != urlForRequest);

        using(var stream = response.GetResponseStream())
        {
            result = new MemoryStream();
            stream.CopyTo(result);
        }
    }
    catch(WebException ex)
    {
        // Things like 404 and, well, all other web-type exceptions.

        Debug.WriteLine(ex.Message);
        if(ex.InnerException != null) Debug.WriteLine(ex.InnerException.Message);
    }
    catch(System.Threading.ThreadAbortException)
    {
        // Let ac.Thread handle some cleanup.
        throw;
    }
    catch(Exception)
    {
        if(!silent) throw;
    }
    finally
    {
        if(response != null) response.Close();
    }

    return result;
}

Solution

  • The stream content is stored in Exception object.

    var resp = new StreamReader(ex.Response.GetResponseStream()).ReadToEnd();