Search code examples
c#jquerycsquery

How should I get the absolute URL in CsQuery?


I'm trying to get the absolute URI of each anchor tag on a Wikipedia page. I think the .href property should give the absolute URI but when I'm trying it in CsQuery I'm finding that it still gives me the relative URI. How should I get the absolute URI?

    static void Main(string[] args)
    {
        string url = "https://en.wikipedia.org/wiki/Barack_Obama";
        var dom = CQ.CreateFromUrl(url);
        var selected = dom["div#mw-content-text a"];
        foreach (var a in selected)
            Console.WriteLine(a["href"]);
    }

Solution

  • CsQuery shows you whatever exists in HTML page...

    You can simply do that:

     string domain = "https://en.wikipedia.org";
    
     var dom = CQ.CreateFromUrl(url);
    
     List<string> urls = new List<string>();
    
     dom["a[href]"].Each(dom=>{
        string url = dom.GetAttribute("href");
        if(!url.StartsWith("https"))
           url = domain + url;
    
        urls.Add(url);
     });
    

    });