Search code examples
javaurlurl-validationpunycode

Java - IDN.toASCII() must protocol be removed before?


I created a URL validator for my JSF web page and now stumbled across a problem with domains where the first word (separated by dot) contains a non ASCII character.

I have the following valid website URL:

http://testä.com

Converting it to puny code using IDN.toASCII() creates invalid URL:

xn--http://test-v8a.com

Should it not be http://xn--test-ooa.com/ ?

I also checked it at German de domain manager DENIC which shows same invalid URL results.

Is this a BUG in Java/RFC or am I missing something?


Workaround

When I remove the protocol at first it works.


Solution

  • The documentation is clear that this method only operates on domain name labels, so yes the protocol needs to be removed.

    A label is an individual part of a domain name. The original ToASCII operation, as defined in RFC 3490, only operates on a single label. This method can handle both label and entire domain name, by assuming that labels in a domain name are always separated by dots.

    Link to Javadoc: https://docs.oracle.com/javase/8/docs/api/java/net/IDN.html#toASCII-java.lang.String-int-