I am writing a tool that allows the user to input a URL, to which the program responds by attempting to show that website's favicon. I have this working for many sites but one site that is giving me trouble is my self-hosted Trac site. It seems that Trac's normal behaviour, until the end user is autenticated, is to show a custom 403 page (Forbidden), inviting the user to log in. Accessing Trac from a web browser, the favicon displays in the browser's tab, even though I'm not logged in (and Firebug, for instance, shows a 403 for the page content). If I view source from the browser, the favicon's location is right there in the source. However, from my application, requesting the Trac website with request.GetResponse()
throws a WebException
containing a 403, giving me no opportunity to read the response stream that contains the vital information required to find the favicon.
I already have code to download a website's HTML and extract the location of its favicon. What I am stuck with is downloading a site's HTML even when it responds with a 403.
I played with various UserAgent
, Accept
and AcceptLanguage
properties of the HttpWebRequest
object but it didn't help. I also tried following any redirects myself as I read somewhere that .NET doesn't do them well. Still no luck.
Here's what I have:
public static MemoryStream DownloadHtml(
string urlParam,
int timeoutMs = DefaultHttpRequestTimeoutMs,
string userAgent = "",
bool silent = false
)
{
MemoryStream result = null;
HttpWebRequest request = null;
HttpWebResponse response = null;
try
{
Func<string, HttpWebRequest> createRequest = (urlForFunc) =>
{
var requestForAction = (HttpWebRequest)HttpWebRequest.Create(urlForFunc);
// This step is now required by Wikipedia (and others?) to prevent periodic or
// even constant 403's (Forbidden).
requestForAction.UserAgent = userAgent;
requestForAction.Accept = "text/html";
requestForAction.AllowAutoRedirect = false;
requestForAction.Timeout = timeoutMs;
return requestForAction;
};
string urlFromResponse = "";
string urlForRequest = "";
do
{
if(response == null)
{
urlForRequest = urlParam;
}
else
{
urlForRequest = urlFromResponse;
response.Close();
}
request = createRequest(urlForRequest);
response = (HttpWebResponse)request.GetResponse();
urlFromResponse = response.Headers[HttpResponseHeader.Location];
}
while(urlFromResponse != null
&& urlFromResponse.Length > 0
&& urlFromResponse != urlForRequest);
using(var stream = response.GetResponseStream())
{
result = new MemoryStream();
stream.CopyTo(result);
}
}
catch(WebException ex)
{
// Things like 404 and, well, all other web-type exceptions.
Debug.WriteLine(ex.Message);
if(ex.InnerException != null) Debug.WriteLine(ex.InnerException.Message);
}
catch(System.Threading.ThreadAbortException)
{
// Let ac.Thread handle some cleanup.
throw;
}
catch(Exception)
{
if(!silent) throw;
}
finally
{
if(response != null) response.Close();
}
return result;
}
The stream content is stored in Exception object.
var resp = new StreamReader(ex.Response.GetResponseStream()).ReadToEnd();