Search code examples
htmlhtml-agility-pack

how can we get the absolute URL from page source while page scraping through HTML Agility?


I am using this code to scraping the HTML page through HTMLAgility. But while scraping the page, I am unable to convert relative url to absoute url.

I am using this code:

HtmlAgilityPack.HtmlDocument doc = web.Load(serviceStatusHTMLURL);
data = doc.DocumentNode.SelectSingleNode("//div[@id='columnRight']").OuterHtml;

I need to scrape the whole page with all HTML tags.


Solution

  • Since you would need to scrape all the HTML content in a single page.You can modify the second line to below code which includes all the contents of that page.

    data = doc.DocumentNode.InnerText;

    Your entire page's content will come under DocumentNode