Search code examples
xmlspecificationswell-formed

Is a colon a legal first character in an XML tag name?


According to the W3C XML Recommendation, start tag-names have the definition:

STag ::= '<' Name (S Attribute)* S? '>'

..where Name is:

Name ::= NameStartChar (NameChar)*
NameStartChar ::= ":" | [A-Z] | ...

..(n.b., states that a colon can appear as the first character) suggesting the following is a valid XML document:

<?xml version="1.0" ?><:doc></:doc>

..but any parser I try this in shows the colon as a formatting error.

Also, under Appendices B (though now a depreciated part of the document) it explicitly states:

Characters ':' and '_' are allowed as name-start characters.

..and:

<?xml version="1.0" ?><_doc></_doc>

..is accepted by the XML parsers I've tried.

So, is a colon a valid first character in a tag-name, and the parsers I'm using are wrong, or am I reading the specification wrong?


Solution

  • Yes, at the base XML level, colon (:) is allowed as a name-start character. The BNF rules you cite clearly specify this.

    However, the W3C XML Recommendation is clear that colons should not be used except for namespaces purposes:

    Note:

    The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.

    And the XML Namespace BNF rules for tags are based on QName, which allow for colon in a name only as a separated between Prefix and LocalPart:

    QName          ::= PrefixedName | UnprefixedName
    PrefixedName   ::= Prefix ':' LocalPart
    UnprefixedName ::= LocalPart
    Prefix         ::= NCName
    LocalPart      ::= NCName
    NCName         ::= Name - (Char* ':' Char*) /* An XML Name, minus the ":" */
    

    One might ask why colon wasn't disallowed in NameStartChar from the beginning. If we're lucky, C. M. Sperberg-McQueen may offer an authoritative explanation. However, I suspect it's a matter of an evolving notion of how namespaces were expected to be designed.

    The first published working draft in 1996 of the W3C XML Recommendation had a definition of STag which did not allow colon:

    STag  ::= '<' Name (S Attribute)* S? '>'
    Name  ::= (Letter | '-') (Letter | Digit | '-' | '.')*
    

    By 1998, colons were allowed in Name,

    Name  ::= (Letter | '_' | ':') (NameChar)*
    

    and an earlier form of the admonition about colon use read:

    Note: The colon character within XML names is reserved for experimentation with name spaces. Its meaning is expected to be standardized at some future point, at which point those documents using the colon for experimental purposes may need to be updated. (There is no guarantee that any name-space mechanism adopted for XML will in fact use the colon as a name-space delimiter.) In practice, this means that authors should not use the colon in XML names except as part of name-space experiments, but that XML processors should accept the colon as a name character.

    The need was anticipated but the precise form perhaps was not yet known when colon was first introduced to tag names.