Search code examples
pandoc

Customising Pandoc writer element output


Is it possible to customise element outputs for a pandoc writer?

Given reStructuredText input

.. topic:: Topic Title

   Content in the topic

Using the HTML writer, Pandoc will generate

<div class="topic">
   <p><strong>Topic Title</strong></p>
   <p>Content in the topic</p>
</div>

Is there a supported way to change the html output? Say, <strong> to <mark>. Or adding another class the parent <div>.

edit: I've assumed the formatting is the responsibility of the writer, but it's also possible it's decided when the AST is created.


Solution

  • This is what pandoc filters are for. Possibly the easiest way is to use Lua filters, as those are built into pandoc and don't require additional software to be installed.

    The basic idea is that you'd match on an AST element created from the input, and produce raw output for your target format. So if all Strong elements were to be output as <mark> in HTML, you'd write

    function Strong (element)
      -- the result will be the element's contents, which will no longer be 'strong'
      local result = element.content
      -- wrap contents in `<mark>` element
      result:insert(1, pandoc.RawInline('html', '<mark>'))
      result:insert(pandoc.RawInline('html', '</mark>'))
      return result
    end
    

    You'd usually want to inspect pandoc's internal representation by running pandoc --to=native YOUR_FILE.rst. This makes it easier to write a filter.

    There is a similar question on the pandoc-discuss mailing list; it deals with LaTeX output, but is also about handling of custom rst elements. You might find it instructional.


    Nota bene: the above can be shortened by using a feature of pandoc that outputs spans and divs with a class of a known HTML element as that element:

    function Strong (element)
      return pandoc.Span(element.content, {class = 'mark'})
    end
    

    But I think it's easier to look at the general case first.