I'm writing a personal app that scrapes data from a website. It currently pulls entire pages before analyzing them and these pages can range from 300 - 600 KiB. The 10 pages that I tested against total up to 4 MiB. The page contains dynamic content so I don't know exactly where the data starts at. I do have delimiters so that I know where the data is at once I've scanned the page. Is there any way to only download up to the portion that I need? This would cut the total download down to 2 MiB for those 10 pages.
Here is a simple example, where you read from a stream until you match a 10 byte delimiter which matches your own. Although the specifics are up to you to handle, I think this represents an easy method to achieve what you want.
StringBuilder sb = new StringBuilder();
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://example.com");
using (var resp = request.GetResponse())
{
using(StreamReader sr = new StreamReader(resp.GetResponseStream()))
{
char[10] block;
sr.ReadBlock(block, 0, 10);
if (block.CharEquals(myDelim))
break;
sb.Append();
}
}
// Process the StringBuilder here.
Please note that CharEquals
is an extension method that simply compares if two character arrays are equal - there's nothing special to it.