Search code examples
utf-8dnssubdomainidn

Utf-8 in subdomain?


Is it possible to use UTF-8 in a subdomain? If so, which characters are allowed and how does the can't-mix-encodings thing work?

I've tried to RTFM, but Google wan't of much help


Solution

  • There aren't many things special about subdomains. A given domain name foo.example.com is an ordered list of labels (foo, example, com). So you might want to know if you can use UTF-8 in a given label.

    The low level answer is that a label is defined as:

    <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
    <let-dig> ::= <letter> | <digit>
    <letter> ::= any one of the 52 alphabetic characters A through Z in upper case and a through z in lower case
    <digit> ::= any one of the ten digits 0 through 9
    <ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>
    <let-dig-hyp> ::= <let-dig> | "-"
    

    which means that you can only find [-a-zA-Z0-9] in a label.

    However, IDNA can be used to encode Unicode characters. In short, a label containing other characters is encoded with: "xn--" + punycode(nameprep(label)).

    As for limitations at least:

    • for characters can't be in a IDN label (U+002E, U+3002, U+FF0E, U+FF61).