The java.net.URI
ctor accepts most non-ASCII characters but does not accept ideographic space (0x3000). The ctor fails with java.net.URISyntaxException: Illegal character in path ...
So my questions are:
URI
ctor accept 0x3000
but does accept other non-ASCII characters ?Please note the 1st example contains the ideographic space rather than a regular space.
It is the ideographic space that is the problem.
Here is the code that allows non-ASCII characters to be used:
} else if ((c > 128)
&& !Character.isSpaceChar(c)
&& !Character.isISOControl(c)) {
// Allow unescaped but visible non-US-ASCII chars
return p + 1;
}
As you can see, it disallows "funky" non-visible characters.
See also the URI
class javadocs which specifies which characters are allowed (by the class!) in each component of a URI.
Why?
It is probably a safety measure.
What others are disallowed?
An character that is whitespace or a control character ... according to the respective Character
predicate methods. (See the Character
javadocs for a precise specification.)
You should also note that this is a deviation from the URI specification. The URI specification says that non-ASCII characters are only allowed if you:
My understanding is that the URI.toASCIIString()
method will take care of that if you have a "deviant" java.net.URI
object.