Search code examples
asp.nethtmlrelative-pathabsolute-path

Relative to absolute paths in HTML


I need to create a newsletters by URL. To do that, I:

  1. Create a WebClient.
  2. Use WebClient's method DownloadData to get a source of page in byte array;
  3. Get string from the source-html byte array and set it to the newsletter content.

However, I have some troubles with paths. All elements' sources were relative (/img/welcome.png) but I need an absolute one, like http://www.example.com/img/welcome.png.

How can I do this?


Solution

  • One of the possible ways to resolve this task is the use the HtmlAgilityPack library.

    Some example (fix links):

    WebClient client = new WebClient();
    byte[] requestHTML = client.DownloadData(sourceUrl);
    string sourceHTML = new UTF8Encoding().GetString(requestHTML);
    
    HtmlDocument htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(sourceHTML);
    
    foreach (HtmlNode link in htmlDoc.DocumentNode.SelectNodes("//a[@href]"))
    {
        if (!string.IsNullOrEmpty(link.Attributes["href"].Value))
        {
            HtmlAttribute att = link.Attributes["href"];
            att.Value = this.AbsoluteUrlByRelative(att.Value);
        }
    }