I am trying to get the Second Level Domain of a URL using boost-url
. For example if url is https://google.com
, I want to store google
in a std::string
.
Here is a complete example:
#include <boost/url.hpp>
#include <iostream>
int main()
{
std::string url_str = "https://google.com";
result = boost::urls::parse_uri(url_str);
boost::urls::url_view url = result.value();
std::string protocol = url.scheme();
std::string domain = std::string(url.host());
std::cout << protocol << std::endl; // outputs `https`
std::cout << domain << std::endl; // outputs `google.com`
// `google`, which is what I require here
return 0;
}
I defined a method myself to extract SLD but I was wondering if boost-url
also provides some function to do so:
std::string get_sec_level_domain(std::string domain)
{
std::size_t pos = domain.find_last_of('.');
if (pos != std::string::npos && pos > 0)
{
std::string sld = domain.substr(0, pos);
return sld;
}
return "";
}
Boost URL implements general purpose URIs. Not all URIs contain fully-qualified domain names. As such, parsing the scheme-dependent parts of the authority are mostly out of scope for the library.
However since internet addresses are ubiquitous in network URIs (ftp, ssh, sftp, http, etc) some support is there, and you might at least use that to your advantage to avoid misinterpreting information as if they were domain names:
As an example test bed:
#include <boost/url.hpp>
#include <iostream>
int main () {
for (auto txt : {
// explicit port
"https://my.pretty.sub.domain.com:8989/path/to/resource?stuff=more&stuff#end",
"https://my.com:8989/path/to/resource?stuff=more&stuff#end",
"https://localhost:8989/path/to/resource?stuff=more&stuff#end",
"https://[::1]:8989/path/to/resource?stuff=more&stuff#end",
"https://127.0.0.1:8989/path/to/resource?stuff=more&stuff#end",
// without port
"https://my.pretty.sub.domain.com/path/to/resource?stuff=more&stuff#end",
"https://my.com/path/to/resource?stuff=more&stuff#end",
"https://localhost/path/to/resource?stuff=more&stuff#end",
"https://[::1]/path/to/resource?stuff=more&stuff#end",
"https://127.0.0.1/path/to/resource?stuff=more&stuff#end",
}) {
if (auto parsed = boost::urls::parse_uri(txt); parsed && parsed->has_authority()) {
auto url = parsed.value();
switch (url.host_type ())
{
case boost::urls::host_type::ipv4:
case boost::urls::host_type::ipv6:
case boost::urls::host_type::ipvfuture:
case boost::urls::host_type::none:
std::cerr << "adress or none: '" << url.host () << "'\n";
break;
case boost::urls::host_type::name:
std::cout << "maybe FQDN: '" << url.host_name () << "'\n";
break;
}
}
}
}
Printing
maybe FQDN: 'my.pretty.sub.domain.com'
maybe FQDN: 'my.com'
maybe FQDN: 'localhost'
adress or none: '[::1]'
adress or none: '127.0.0.1'
maybe FQDN: 'my.pretty.sub.domain.com'
maybe FQDN: 'my.com'
maybe FQDN: 'localhost'
adress or none: '[::1]'
adress or none: '127.0.0.1'
1 note: coliru sadly doesn’t let me share it because it triggers spam detection with the urls. But the output can be seen there if you just copy paste and build g++ -std=c++20 -O2 -Wall -pedantic -pthread main.cpp -lboost_url && ./a.out