Search code examples
htmldtdxml

Empty DTD Element with extra info


I know how DTD works and when I want to declare <br> (or <br/>), I have to do this:

<!ELEMENT br EMPTY>

But in this file (https://www.w3.org/MarkUp/html-spec/html.dtd), I found the following line:

<!ELEMENT BR    - O EMPTY>

I see that the most are the same, but what difference does the - and the O make? Where can I learn more about it?

Thanks


Solution

  • The - and O (letter O) characters are SGML tag omission indicators, and declare (to an SGML parser) that the start- and/or end-tag, respectively, for an element can be omitted (if O) or must be present (if -). The first - or O character is for the start-element tag, and the second is for the end-element tag.

    For example, a valid HTML document can look like this (where the "public identifier" is shortened to ... for clarity):

    <!DOCTYPE html PUBLIC "...">
    <title>A valid HTML document</title>
    <p>Body Text
    

    SGML uses an HTML DTD (containing element and attribute declarations for HTML) to infer the omitted tags such that the example markup is parsed as

    <!DOCTYPE html PUBLIC "...">
    <html>
      <head>
        <title>A valid HTML document</title>
      </head>
      <body>
        <p>Body Text</p>
      </body>
    </html>
    

    Element declarations with tag omission indicators that would make an SGML parser arrive at the inferred and canonical markup are

    <!ELEMENT html O O (head,body)>
    <!ELEMENT head O O (title)>
    <!ELEMENT body O O (#PCDATA,p)+>
    <!ELEMENT p - O (#PCDATA)>
    

    The case of the br element is special because it's an EMPTY element (HTML calls these "void elements", and has additional ones such as img andhr). An element with declared content EMPTY, in traditional SGML, isn't allowed to have an end-element tag at all, and use of an end tag omission indicator is redundant or even misleading.

    For a full SGML DTD for modern HTML (W3C HTML 5 and 5.1), see my project/paper linked from http://sgmljs.net/blog/blog1701.html .

    Note that HTML, since version 5, doesn't anymore publish a SGML DTD or other formal grammar, and doesn't formally reference the SGML standard (ISO 8879) as normative reference. The declaration you stumbled upon is likely from the older HTML 4 specification.

    Edit: also wanted to clarify that for using <br/> in XML, you don't have to declare it EMPTY; in fact, you don't have to declare anything at all; also, since you asked for it, you can go to (our site) http://sgmljs.net/docs/sgmlrefman.html for learning more on SGML