Search code examples
c++qtqurl

Remove `www.` from QUrl in Qt 5.5


So in another part of my program I read out various urls from my browser. Say I have http://www.example.com as well as http://example.com and https://example.com. For a browser, these three urls are different. To me, only the 'base' domain (example.com) is important.

I now am trying to strip the www from the domain, however, can not succeed. I'd like to do so using the provided QUrl library instead of checking whether the string includes a www. and remove it afterwards. As you can see, it's more of a design-decision here ;)

Here's my current application.

main.cpp

#include <QApplication>
#include <QDebug>
#include <QUrl>
#include <QList>

int main(int argc, char *argv[])
{
    QList<QUrl> urlList;
    urlList << QUrl("http://example.com/qwe/whoami/123#123141");

    urlList << QUrl("chrome://newtab/");
    urlList << QUrl("favorites://");
    urlList << QUrl("");

    urlList << QUrl("https://www.google.de/");
    urlList << QUrl("https://google.de/");
    urlList << QUrl("https://www.youtube.com/watch?v=XTPGpBBqwe");

    urlList << QUrl("https://youtube.com/watch?v=189273ijadzqiuwejk");
    urlList << QUrl("http://raspberrypi.stackexchange.com/questions/10371/whoisthisyo");
    urlList << QUrl("https://stackoverflow.com/questions/33478464/alfresco-custom");

    urlList << QUrl("http://localhost:3000");
    urlList << QUrl("localhost:3000");

    for (int i = 0; i < urlList.count(); i++) {
        qDebug() << "[" << i+1 << "] " << urlList[i].host();
    }


    return 0;
}

Thanks for your help!


Solution

  • There is no such function provided out of the box.

    The best solution I can think of is to replace the "www." at the beginning of the host part of the URL, if it exists.

    Note that you should not remove any other occurence of the string "www." in the host or even in the rest of the URL, so we check if the QUrl::host() begins with "www." and then remove those four characters from it.

    Also note that technically, this changes the host name in a way it could lead you to a different web site. (Although practically, every website should deliver the same content with or without the www. subdomain prefix for usability reasons.) Also, it could lead to totally unintended results for some special cases, for example where www. is not even a subdomain: the domain www.com will result in just com.

    QUrl remove_www(QUrl url) {
        QString host = url.host();
        if (host.startsWith("www."))
            host = host.mid(4); // = remove first 4 chars
        url.setHost(host);
        return url;
    }
    

    Then use the return value of this function:

    for (int i = 0; i < urlList.count(); i++) {
        qDebug() << "[" << i+1 << "] " << remove_www(urlList[i]);
    }