Search code examples
luapandoc

Pandoc Lua filters: how to specify attributes for Span element


I have a Markdown document containing raw LaTeX commands. I am trying to use a Lua filter with Pandoc (2.0.1.1) to convert the LaTeX commands into something more portable. In particular, commands that specify the language of text should be converted into spans with a lang attribute. The problem is that I don't know how to pass the attributes to the pandoc.Span constructor. This is my attempt at a filter (filter.lua):

function RawInline(elem)
  if elem.format == "tex" then
    text = string.match(elem.text, "\\textspanish{(.+)}")
    if text then
      contents = {pandoc.Str(text)}
      attrs = pandoc.Attr("",{},{lang = "es-SP"})
      return pandoc.Span(contents, attrs)
    end
  else
    return elem
  end
end

Sample usage:

echo '\textspanish{hola}' | pandoc -f markdown -t native --lua-filter=filter.lua

The output is [Para [Span ("",[],[]) [Str "hola"]]], with no attributes on the span.

If I pass a name and/or class to pandoc.Attr, these come through, e.g., attrs = pandoc.Attr("name",{"class"},{lang = "es-SP"}) produces [Para [Span ("name",["class"],[]) [Str "hola"]]]. But attributes I pass to the constructor never appear in the output. What is the right way to pass attributes to pandoc.Attr?


Solution

  • This used to be one of the rough edges in the Lua filter implementation; it has since been ironed out and made more user friendly, so the above example now works as expected.

    Background

    Internally, pandoc uses two-element tables to hold key-value pairs. It roughly looks like this:

    attrs = pandoc.Attr("", {}, {{"lang", "es-SP"}})
    

    Of course, this is not a great way to represent pairs. The reason for the current implementation is two-fold:

    1. It mirrors how way pairs (and attributes in general) are encoded in pandoc's JSON output.
    2. These pairs have a fixed order.

    The last part is important when one wants to guarantee that the order of attributes won't be changed when passing through a filter. There is no rule in Lua which determines the order of keys in a table: the Lua table {one = 1, two = 2} could be read back into pandoc as the attribute list {one="1" two="2} or as {two="2" one="1"}. Now, the order of attributes shouldn't matter for most applications, but we cannot be sure. Hence the less-than-intuitive representation.

    Current state (pandoc 2.16 and later)

    The internal representation hasn't changed, but we have since improved the representation of Attr objects in Lua, extended the marshaling code, and added a Lua metatable. As a result, attribute tables are treated as expected. Furthermore, many users may find it more intuitive to use HTML-like attribute lists instead of "identifier, class, atttributes" triples. That is supported as well now:

    attr = pandoc.Attr{id='some-id', class="one two", lang='es-SP'}
    

    In fact, it is not necessary to use the pandoc.Attr constructor at all, just passing a table will work:

      return pandoc.Span(contents, {lang='es-SP'})