I'm automatically generating reStructuredText files that are rendered by Sphinx to several formats, including HTML. The reStructuredText files sometimes contain HTML special characters such as <
that the HTML builder fails to escape resulting in invalid HTML output. This prevents me from automating the documentation generation process forcing me to manually fix the output files. A concrete example of the problem is:
<div class="line">
<code class="docutils literal notranslate">
<span class="pre">public</span>
</code>
<span class="xref std std-ref">heap(
</span>
</div>
It occurs on a heap(<)
text fragment. The output currently must be manually fixed to:
<div class="line">
<code class="docutils literal notranslate">
<span class="pre">public</span>
</code>
<a class="reference internal" href="heap_1.html#heap-1">
<span class="std std-ref">heap(<)</span>
</a>
</div>
I cannot find in the Sphinx documentation for the HTML builder any solution for this problem. Is there any workaround? Fixing the problem in the original text is not an option (the text is source code that must compile clean; escaping characters like <
there would break its compilation). The corresponding reStructuredText file fragment is:
| **Extends:**
| ``public`` :ref:`heap(<) <heap/1>`
which is automatically generated from a XML file fragment:
<extends>
<name><![CDATA[heap(<)]]></name>
<functor><![CDATA[heap/1]]></functor>
<scope>public</scope>
<file><![CDATA[heap_1]]></file>
</extends>
Lets start by addressing the Hyperlink Targets that will be referenced, the following examples use:
Hyperlink Targets - reStructuredText Markup Specification.
Named hyperlink targets consist of an explicit markup start (".. "), an underscore, the reference name (no trailing underscore), a colon, whitespace, and a link block:
.. _hyperlink-name: link-block
Next lets look at the references themselves:
Cross-referencing syntax - Roles.
(...) like in reST direct hyperlinks:
:role:`title <target>`
will refer to target, but the link text will be title.(...)
Cross-referencing arbitrary locations - Roles.
:ref:
(...) but you must give the link an explicit title, using this syntax:
:ref:`Link title <label-name>`
.
Now to the question, the following reST together with a pair of Named Hyperlink Targets previously mentioned:
.. _hyperlink-name:
.. _hyperlink-name2/:
| **Extends:**
| ``private`` :ref:`some title <hyperlink-name>`
| **Extends:**
| ``private`` :ref:`some title <hyperlink-name2/>`
Gives the following XML doctree targets:
<target refid="hyperlink-name"></target>
<paragraph ids="hyperlink-name" names="hyperlink-name">
<target refid="hyperlink-name2"></target>
<paragraph ids="hyperlink-name2" names="hyperlink-name2/">
And the following XML doctree references:
<line><literal>private</literal>
<reference internal="True" refid="hyperlink-name">
<inline classes="std std-ref">some title</inline>
</reference>
</line>
<line><literal>private</literal>
<reference internal="True" refid="hyperlink-name2">
<inline classes="std std-ref">some title</inline>
</reference>
</line>
From these the following HTML is generated:
<p id="hyperlink-name">
<p id="hyperlink-name2">
<div class="line">
<code class="docutils literal notranslate">
<span class="pre">private</span>
</code>
<a class="reference internal" href="#hyperlink-name">
<span class="std std-ref">some title</span>
</a>
</div>
<div class="line">
<code class="docutils literal notranslate">
<span class="pre">private</span>
</code>
<a class="reference internal" href="#hyperlink-name2">
<span class="std std-ref">some title</span>
</a>
</div>
So far only the forward slash in the refid
corresponding to .. _hyperlink-name2/:
has been normalized. Looking at the syntax :ref:`Link title <label-name>`
this addresses any problems with the label-name
.
Now lets try the full example:
| **Extends:**
| ``private`` :ref:`heap(<) <hyperlink-name2/>`
The above immediatly has Sphinx issue a warning:
C:\path_to_your_rest_file.rst:98: WARNING: undefined label: ) <hyperlink-name2/
build succeeded, 1 warning.
Look carefully at the warning...! That's why your HTML was broken, because you violated one of the few syntax rules in writing the Sphinx :ref:
role. It's not a HTML builder problem, nor a reST parser problem. The first <
"Less-than sign" character defines the end of the Link title
in the :ref:
role and the beginning of the label-name
. That's why the undefined label is ) <hyperlink-name2/
instead of just hyperlink-name2/
.
If you escape the <
"Less-than sign" character:
| **Extends:**
| ``private`` :ref:`heap(\<) <hyperlink-name2/>`
In the doctree the Sphinx parser will already have converted the character to (<)
.
<line><literal>private</literal>
<reference internal="True" refid="hyperlink-name2">
<inline classes="std std-ref">heap(<)</inline>
</reference>
</line>
Also after the HTML builder step:
<div class="line">
<code class="docutils literal notranslate">
<span class="pre">private</span>
</code>
<a class="reference internal" href="#hyperlink-name2">
<span class="std std-ref">heap(<)</span>
</a>
</div>
I cannot find in the Sphinx documentation for the HTML builder any solution for this problem.
There isn't any, not in docutils configurations nor in Sphinx configuration. Because neither has a configuration to solve malformed reST or Sphinx roles.
Fixing the problem in the original text is not an option (the text is source code that must compile clean; escaping characters like < there would break its compilation).
You don't have to change the original source code. If you're generating XML -> XSLT -> reST
the final reST/Sphinx syntax has to be correct. So rewrite the XSLT or the XML for the :ref:
role (or do some pre-processing on the reST before generating with Sphinx).