I am trying to read W3C recommendation for XML, and I found myself a bit puzzled by the language used to define things, the one that uses ::=
notation.
Most of the time those definitions look like regular expressions:
STag ::= '<' Name (S Attribute)* S? '>'
But from time to time I come across strange notation, like the following:
Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
What does Char - '-'
mean? Match anything that Char
matches excluding '-'
?
Where can I find formal definition of that language? I tried to search via "::=" but Google just ignores it. The W3C recommendation itself doesn't have any information on the matter.
It's one of very many variants of BNF (Backus Naur Form) - which as you point out has similarities to regular expressions.
The "except" operator ("-") is a little unusual, in my experience. (Char - '-')
means "Anything that matches Char and does not match '-'" - that is, any character except a hyphen.
The particular flavour of BNF that the XML specification uses is described in section 6 of the spec: