Search code examples
xmlspecificationsw3c

Language that W3C XML Recommendation uses to present definitions


I am trying to read W3C recommendation for XML, and I found myself a bit puzzled by the language used to define things, the one that uses ::= notation.

Most of the time those definitions look like regular expressions:

STag       ::=      '<' Name (S Attribute)* S? '>'

But from time to time I come across strange notation, like the following:

Comment    ::=      '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

What does Char - '-' mean? Match anything that Char matches excluding '-'?

Where can I find formal definition of that language? I tried to search via "::=" but Google just ignores it. The W3C recommendation itself doesn't have any information on the matter.


Solution

  • It's one of very many variants of BNF (Backus Naur Form) - which as you point out has similarities to regular expressions.

    The "except" operator ("-") is a little unusual, in my experience. (Char - '-') means "Anything that matches Char and does not match '-'" - that is, any character except a hyphen.

    The particular flavour of BNF that the XML specification uses is described in section 6 of the spec:

    https://www.w3.org/TR/REC-xml/#sec-notation