Search code examples
htmlxmlwebxml-parsingmarkup

What exactly are storage units in XML?


I am trying to understand more on the physical and logical structure of an XML document. From the specification at W3C describing the physical structures:

An XML document may consist of one or many storage units. These are called entities;...

So my question is:

  1. What exactly is a storage unit referring in this context?
  2. Is it used from the perspective of an XML processor and how it would store and manipulate the XML document in memory or is it referring to a persistent storage used to store the document?

Solution

  • An entity in XML and SGML represents a character stream. It can be an external entity, where the character content is accessed from another file or network (HTTP) stream, or an internal entity, which is part of the literal content of the document in which it's declared and referenced. An internal entity can be declared like this

    <!ENTITY e "replacement text for e">
    

    and then used as the &e; entity reference in content like this

    <p> some text ... &e; ... other text </p>
    

    such that an XML or SGML processor will replace &e; with replacement text for e. The concept of an entity is also used for other purposes.

    As to the second question, the entity concept is related to "storage" of character data in external files or network streams; it doesn't refer to internal memory representations of a markup processor.