Search code examples
xmlstringfilew3c

Can an XML start with anything else than a "<"?


Can an XML start with anything other than a < character?

It was a random thought I just had, when I was trying to define how to differentiate a string containing a XML and one containing a path to a XML.

I believe the answer is no, but I'm looking to be certain.


Solution

  • Only a < or a whitespace character can begin a well-formed XML document.

    The W3C XML Recommendation includes a EBNF which definitively defines an XML document:

     [1] document ::= prolog element Misc*
    [22] prolog   ::= XMLDecl? Misc* (doctypedecl Misc*)?
    [23] XMLDecl  ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
    [27] Misc     ::= Comment | PI | S
     [3] S        ::= (#x20 | #x9 | #xD | #xA)+
    

    From these rules it follows that an XML document may start with a whitespace character or a < character from any one of the following constructs:

    • XML Declaration
    • Comment
    • PI
    • Doctype Declaration
    • Element

    An XML document may start with no other character.

    Notes:

    1. An implication of these rules is that if an XML document contains an XML declaration, it must appear at the top (or you could receive a somewhat cryptic error message). So, for XML documents with an XML declaration, the first character will have to be a < and cannot be whitespace.
    2. A BOM may appear at the beginning of an XML document entity to indicate the byte order of the character encoding being used. These two bytes are typically not considered to be part of the XML document itself but rather the storage unit of the physical structure supporting the XML document. A BOM, along with an XML declaration, assist XML processors in character encoding detection. [Suggestion for BOM mention thanks to JonHanna]