I am trying to figure out how to print the contents of a scraped web page to a downloadable .txt file on an Asp.Net web page. I currently am able to print the contents of this page to a label on the web page but cannot figure out how to properly print each value on a new line into a .txt file and download it straight to the client's browser. Currently my code is the following for printing to the label:
//Read HTML of Webpage inserted into urlTextbox
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(urlTextbox.Text);
//Selecting body text
var bodySec = doc.DocumentNode.SelectNodes("//body[@class]");
foreach (var node in bodySec)
{
//Selecting ONLY links from body section
var linkSec = doc.DocumentNode.SelectNodes(".//a[@href]");
foreach (HtmlNode node2 in linkSec)
{
string attributeValue = node2.GetAttributeValue("href", "");
var baseUrl = new Uri("url.com");
var url = new Uri(baseUrl, attributeValue);
string links = url.AbsoluteUri;
scriptLbl.Text += links;
var linkLines = Regex.Split(links, @"\-\-\-");
////Printing Links line by line
foreach(string link in linkLines)
{
var prt1 = link + "<br>";
scriptLbl.Text += prt1;
}
}
}
Currently it scrapes the page wonderfully and prints the links in the desired format. Optimally I would like to write to a file in the same format and have it downloaded on the same button click. I have tried using StreamWriter to accomplish this, but it only ever printed the first line of the scraping contents. The following is my attempt w/ StreamWriter:
Response.ContentType = "text/plain";
Response.AddHeader("content-disposition", "attachment;filename=Urllist.txt");
Response.Clear();
using (StreamWriter writer = new StreamWriter(Response.OutputStream, Encoding.UTF8))
{
writer.Write(links);
}
Response.End();
Any help on this issue would be greatly appreciated. I have tried using other similar answers to questions, but none seem to provide me with the full list of links from the string.
I solved this issue by creating a list of the items read from the label and iterating through them individually.
string conv = label.Text;
var result = con.Split(' ');
using(StreamWriter sw = new StreamWriter(Response.OutputStream, Encoding.UTF8))
{
foreach(var s in result.Distinct())
{
//using distinct to ensure no repeated items (scraping multiple pages w/ same links possible)
sw.WriteLine(s);
}
}