(Specifications: https://www.w3.org/TR/sparql11-query/#rIRIREF)
According to the specification, an IRIREF can be parsed as this:
[139] IRIREF ::= '<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
What is bothering me is this part of the expression:
\]-[
If I consider \
to be an escaping character in the bracketed character class (which would be the case in a Perl regular expression), then it means the \
alone is not a problem in the IRIREF and this is valid: <http://hello\world>
Then there is this big problem with the range: ]-[
. The character ]
has an ordinal value of 93 and the [
of 91. This means we have an invalid range: 93 to 92. This is not allowed in most regex engines I tested.
What does that means?
-
as a regular character in the bracketed character class, then this is invalid IRIREF: <http://new-example.org>
. It makes no sense.]-[
null and this IRIREF is valid: <http://hello[world]>
[
, \
and ]
are invalid characters. This makes sense.The SPARQL spec says that its grammar is written using the notation defined by the XML 1.1 specification.
In that notation, the right-hand side you quote,
'<' ([^<>"{}|^`\]-[#x00-#x20])* '>'
denotes a sequence of
zero or more characters matching the expression [^<>"{}|^`]-[#x00-#x20]; this is a set difference denoting
\] = any character other than '<', '>', '"', '{', '}', '|', '^', '
', or '\'; n.b. '\' is not an escape character in this notation (which has no escape characters at all)This is a slightly odd way to write this pattern; it could equally well be written as [^<>"{}|^`#x00-#x20]; I'm not sure why the editors wrote it the way they did.
a '>' character
So to answer your questions one by one:
Should I consider the - as a regular character in the bracketed character class, then this is invalid IRIREF: http://new-example.org. It makes no sense.
No. When A and B are expressions in this notation, A - B denotes any string in the language of A that is not also a string in the language of B. Here A and B are each character-class expressions, one negative and one positive.
You are right that it would make no sense to prohibit hyphens from a grammar rule intended to accept IRIs bracketed by angle brackets.
Should I consider the range ]-[ null and this IRIREF is valid: http://hello[world]
']-[' does not denote a range here, null or otherwise; the ]
ends the first character class expression and the [
begins the second.
What I think is more likely is that the range is inverted and is not a problem for w3c specifications, which means the characters [, \ and ] are invalid characters. This makes sense.
If my parsing of the expression is correct, '[' and ']' are legal (they are not excluded by the first expression, and they are not excluded by the second); '\' is excluded by the first expression.