Search code examples
xmlxsltdocbook

xsltproc: doctype for docbook


I have an XSLT style sheet that generates DocBook XML. I used xsl:output to generate a DOCTYPE declartion for the docbook

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0"
              doctype-public="-//OASIS//DTD DocBook XML V4.5//EN"
             encoding="utf-8"
             indent="no" />

The resulting XML file has an extra empty string, so xmllint complains:

/path/docbk.xml:2: parser error : Content error in the external subset
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" ""><book>
^

Is this an issue with xsltproc or with the XSLT stylesheet?


Solution

  • SGML allows a DOCTYPE with only a PUBLIC identifier, but XML requires a system identifier - you can have either a system ID alone or a public ID and a system ID, but not just the public one. The docbook guide suggests

    <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                   "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
    

    which would correspond to

    <xsl:output method="xml" version="1.0"
                doctype-public="-//OASIS//DTD DocBook XML V4.5//EN"
                doctype-system="http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
                encoding="utf-8"
                indent="no" />
    

    In fact, xsltproc does have a bug here, but not the one you think. From the spec for xsl:output:

    If the doctype-system attribute is specified, the xml output method should output a document type declaration immediately before the first element. The name following <!DOCTYPE should be the name of the first element. If doctype-public attribute is also specified, then the xml output method should output PUBLIC followed by the public identifier and then the system identifier; otherwise, it should output SYSTEM followed by the system identifier. The internal subset should be empty. The doctype-public attribute should be ignored unless the doctype-system attribute is specified.

    (my bold)