I have an application with an address bar, and users type in an IRI to which I must connect.
On unix/Darwin, this is simple: I flatten the IDN to a URI as described in RFC3987. That is, if the scheme has an authority section, I map that to ASCII with punycode, then percent-encode any non-ASCII characters in the rest of the IRI.
On Windows, there any two possibilities: either the domain name is a normal internet one, in which case it should be mapped to ASCII with punycode and looked up with normal DNS. Or, the domain name is a weird Windows one (eg. Active Directory DNS server) and the lookup should in fact be UTF-8.
http://☃.net
: call getaddrinfo(service="xn--n3h.net")
.http://dryden.internal.corp.com
: calling getaddrinfo(service="dryden.internal.corp.com")
will work fine.http://pöp.internal.corp.com
:
getaddrinfo(service="xn--pp-fka.internal.corp.com")
does not work if "pöp" is a machine name published by UTF-8 DNS.GetAddrInfoW(service=T"pöp.internal.corp.com")
works.Firefox and Chrome both do punycode straight away on any IRIs, so can't resolve the weird Microsoft domains.
What guidelines are there for handling IRIs in such an environment? Are there any recommended ways of guessing which sort of DNS lookup should be done, punycode or UTF-8 DNS? What do other applications do?
My current best attempt at a solution is to do punycode first if it's public TLD, but skip trying punycode if the TLD is internal (acme.com might serve public things, acme.ltd is probably an intranet). If punycode failed or was skipped, I try a UTF-8 query.
There is one workaround with negative impact on response time - if nothing else helps, you may try to make 2 calls with both methods and take response from the first one to succeed.