Search code examples
htmlqtanchorhrefqwebkit

Get full href list from a QWebPage


I am trying to use a QWebPage (from QWebKit) to list all the href attributes from A tags with the full URL. At the moment, I do this:

QWebElementCollection collection = webPage->mainFrame()->findAllElements("a");
foreach (QWebElement element, collection)
{
    QString href = element.attribute("href");
    if (!href.isEmpty())
    {
        // Process
    }
}

But the problem is that href could be a full URL, just a page, a URL with / at the front, or a URL with ../ at the front. Is there a way to parse all these different URLs to produce the full URL in a QString or a QUrl?


Solution

  • QWebFrame has a function named baseUrl which will provide a QUrl object for helping you to resolve the urls in the page.

    With it you can call the resolved function with a separate QUrl (built from the href) to resolve the url. If the url is relative, it converts it to the resolved absolute url. If it isn't relative, it returns it with no modifications instead.

    Here's an (untested) example based on the code you provided:

    QUrl baseUrl = webPage->mainFrame()->baseUrl();
    
    QWebElementCollection collection = webPage->mainFrame()->findAllElements("a");
    foreach (QWebElement element, collection)
    {
        QString href = element.attribute("href");
        if (!href.isEmpty())
        {
            QUrl relativeUrl(href);
    
            QUrl absoluteUrl = baseUrl.resolved(relativeUrl);
    
            // Process
        }
    }