I'm trying to download HTML so I can parse it using the minimum bandwidth to download. This is a bit of my code.
if (!String.IsNullOrEmpty(siteAddress))
webReq = WebRequest.Create(siteAddress)
WebResponse webRes = webReq.GetResponse();
Stream streamResponse = webRes.GetResponseStream();
StreamReader streamRead = new StreamReader(streamResponse);
StringReader sr = new StringReader(streamRead.ReadToEnd().Trim());
streamResponse.Close();
streamRead.Close();
webRes.Close();
HtmlAgilityPack.HtmlDocument hDoc = new HtmlAgilityPack.HtmlDocument();
hDoc.Load(sr);
Can someone confirm that retrieving the response only provides the text response, and no images are downloaded as well? What about when loading it with the HTMLAgilityPack method?
When using WebClient
, WebRequest
or HtmlAgilityPack
it is only the html you will download.
If you want the images (or other resources), you have to search for the image urls in the downloaded document and issue requests yourself to get them.
If you want to experiment a bit, the WebBrowser
control could be something to look at. From that, you could take the Document
property and look at its property Images
and download all the images yourself.
What do you want to do?