Search code examples
pythonpublic-suffix-list

Extracting domain for "asterisk-prefixed" suffixes


I use tldextract (version 2.2.2) to extract subdomain/domain/suffix from URLs.

I recently noticed a result that I was surprised by:

>>> from tldextract import extract
>>> extract('http://althawrah.ye/archives/597366')
ExtractResult(subdomain='', domain='', suffix='althawrah.ye')

Instead of being picked up as the domain, althawrah is picked up as part of the suffix. Why is this?

Snooping around a bit, I notice in the Public Suffice List itself that .ye is one of a small number of suffixes that uses a leading asterisk, e.g.

// fj : https://en.wikipedia.org/wiki/.fj
*.fj
// ye : http://www.y.net.ye/services/domain_name.htm
*.ye

The implication here is that these suffixes do not allow domain names to be registered directly under the suffix, but instead must be registered as a third level name. However, this is not the case with http://althawrah.ye/; that is, althawrah is not listed as a second-level domain of .ye. So, what is going on here?


Solution

  • Based on the history of the list and the description of the process for updating, it looks like the Yemen entry is simply wrong or out of date. The entry was added before 2007 (when the list was migrated from CVS to git), while the list guidelines state that:

    Changes [for ICANN Domains] need to either come from a representative of the registry (authenticated in a similar manner to below) or be from public sources such as a registry website.

    The website linked in the list (which hasn't changed since 2002) gives little detail but does mention URLs of the format www.yourcompany.com.ye, which is where the *.ye rule presumably came from. IANA's root zone database specifies TeleYemen as the current TLD manager, but there is no mention of domain registration on their site. The Wikipedia list of supposed "second level domains" was added in 2008 by a Canadian user linking to a since-deleted website of a company called phpcomet (archived here) which claimed to sell domains in the listed second level domains. However, a google search for "site:ye" reveals plenty of sites outside those domains (e.g. press24.ye, ndc.ye) and fails to give any result for many of them (me.ye, co.ye, ltd.ye, plc.ye).

    I'm not sure what could be done to update the official list, but I wouldn't be surprised if the correct entry would read something like:

    ye
    com.ye
    edu.ye
    gov.ye
    org.ye