Search code examples
ebnf

Deciphering EBNF from XML specification


Take a look at the definition below. What exactly is this supposed to define? According to the EBNF specification, brackets [] define an optional item, so why is the * required? Isn't that superfluous (since it means a repetition of zero or more times)?

The second thing is, how do you interpret the part within parentheses? The - is the exclusion indicator, so does it mean excluding any of the items within parentheses, or the sequence of all three (zero or more from ^<&, followed by ]]>, followed by zero or more from ^<&)?

CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)

Or am I completely mistaken, and this is something other than EBNF?

Thanks in advance


Solution

  • The XML specification does not strictly use EBNF as specified by ISO. If you look at Section 6 of the XML specification, it defines the notation used. Square brackets are used in a regex-like manner, not to denote an optional element of the grammar; and the - used for exclusion excludes the group within the parentheses as a whole. Thus, the line you quoted denotes builds up as follows:

    • [^<&] - any character that is not a left angle bracket (<) or an ampersand (&)
    • [^<&]* - zero or more characters that are not left angle brackets or ampersands
    • [^<&]* - ([^<&]* ']]>' [^<&]*) - zero or more characters that are not left angle brackets or ampersands and which do not contain the particular sequence of characters ]]> anywhere within the overall sequence