I'm trying to get the absolute URI of each anchor tag on a Wikipedia page. I think the .href property should give the absolute URI but when I'm trying it in CsQuery I'm finding that it still gives me the relative URI. How should I get the absolute URI?
static void Main(string[] args)
{
string url = "https://en.wikipedia.org/wiki/Barack_Obama";
var dom = CQ.CreateFromUrl(url);
var selected = dom["div#mw-content-text a"];
foreach (var a in selected)
Console.WriteLine(a["href"]);
}
CsQuery shows you whatever exists in HTML page...
You can simply do that:
string domain = "https://en.wikipedia.org";
var dom = CQ.CreateFromUrl(url);
List<string> urls = new List<string>();
dom["a[href]"].Each(dom=>{
string url = dom.GetAttribute("href");
if(!url.StartsWith("https"))
url = domain + url;
urls.Add(url);
});
});