I'm trying to load the page source of following website using Htmlagility C#, it always return "Page Not Found" but when i open it in normal browser (chrome) its displaying all the contents.
HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc = web.Load("http://www.alfatah.pk/");
I'm getting a 404 as well with your code. Somehow they are aware that we are no human beings but web robots!
This works for me:
HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
web.UserAgent="Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0";
web.PreRequest += (request) =>
{
request.Headers.Add(HttpRequestHeader.Accept, "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
request.Headers.Add(HttpRequestHeader.AcceptLanguage, "de-DE");
return true;
};
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc = web.Load("http://www.alfatah.pk/");