I'm trying to use the docutils package to convert ReST to HTML. This answer succinctly uses the docutils publish_*
convenience functions to achieve this in one step. The ReST documents that I want to convert have multiple sections that I want to separate in the resulting HTML. As such, I want to break this process down:
It's step three that I'm struggling with. Here's how I do steps one and two:
from docutils import utils
from docutils.frontend import OptionParser
from docutils.parsers.rst import Parser
# preamble
rst = '*NB:* just an example.' # will actually have many sections
path = 'some.url.com'
settings = OptionParser(components=(Parser,)).get_default_values()
# step 1
document = utils.new_document(path, settings)
Parser().parse(rst, document)
# step 2
for node in document:
do_something_with(node)
# step 3: Help!
for node in filtered(document):
print(convert_to_html(node))
I've found the HTMLTranslator
class and the Publisher
class. They seem relevant but I'm struggling to find good documentation. How should I implement the convert_to_html
function?
My problem was that I was trying to use the docutils package at too low a level. They provide an interface for this sort of thing:
from docutils.core import publish_doctree, publish_from_doctree
rst = '*NB:* just an example.'
# step 1
tree = publish_doctree(rst)
# step 2
# do something with the tree
# step 3
html = publish_from_doctree(tree, writer_name='html').decode()
print(html)
Step one is now much simpler. That said, I'm still slightly dissatisfied with the result; I realise that what I really want is a publish_node
function. If you know a better way please do post it.
I should also note that I haven't managed to get this working with Python 3.
What I was actually trying to do was extract all of the sidebar elements from the doctree so they can be handled separately to the main body of the article. This is not the sort of use case that docutils
was intended to solve. Hence no publish_node
function.
Once I realised this, the correct approach was simple enough:
docutils
.BeautifulSoup
.Here's the code that got the job done:
from docutils.core import publish_parts
from bs4 import BeautifulSoup
rst = get_rst_string_from_somewhere()
# get just the body of an HTML document
html = publish_parts(rst, writer_name='html')['html_body']
soup = BeautifulSoup(html, 'html.parser')
# docutils wraps the body in a div with the .document class
# we can just dispose of that div altogether
wrapper = soup.select('.document')[0]
wrapper.unwrap()
# knowing that docutils gives all sidebar elements the
# .sidebar class makes extracting those elements easy
sidebar = ''.join(tag.extract().prettify() for tag in soup.select('.sidebar'))
# leaving the non-sidebar elements as the document body
body = soup.prettify()