Search code examples
pdfluamarkdownpandoc

How do I replace part of a string with a lua filter in Pandoc, to convert from .md to .pdf?


I am writing markdown files in Obsidian.md and trying to convert them via Pandoc and LaTeX to PDF. Text itself works fine doing this, howerver, in Obsidian I use ==equal signs== to highlight something, however this doesn't work in LaTeX.

So I'd like to create a filter that either removes the equal signs entirely, or replaces it with something LaTeX can render, e.g. \hl{something}. I think this would be the same process.

I have a filter that looks like this:

return {
  {
    Str = function (elem)
      if elem.text == "hello" then
        return pandoc.Emph {pandoc.Str "hello"}
      else
        return elem
      end
    end,
  }
}

this works, it replaces any instance of "hello" with an italicized version of the word. HOWEVER, it only works with whole words. e.g. if "hello" were part of a word, it wouldn't touch it. Since the equal signs are read as part of one word, it won't touch those.

How do I modify this (or, please, suggest another filter) so that it CAN replace and change parts of a word?

Thank you!

this works, it replaces any instance of "hello" with an italicized version of the word. HOWEVER, it only works with whole words. e.g. if "hello" were part of a word, it wouldn't touch it. Since the equal signs are read as part of one word, it won't touch those.

How do I modify this (or, please, suggest another filter) so that it CAN replace and change parts of a word?

Thank you!


Solution

  • A string like Hello, World! becomes a list of inlines in pandoc: [ Str "Hello,", Space, Str "World!" ]. Lua filters don't make matching on that particularly convenient: the best method is currently to write a filter for Inlines and then iterate over the list to find matching items.

    For a complete example, see https://gist.github.com/tarleb/a0646da1834318d4f71a780edaf9f870.

    Assuming we already found the highlighted text and converted it to a Span with with class mark. Then we can convert that to LaTeX with

    function Span (span)
      if span.classes:includes 'mark' then
        return {pandoc.RawInline('latex', '\\hl{')} ..
          span.content ..
          {pandoc.RawInline('latex', '}')}
      end
    end
    

    Note that the current development version of pandoc, which will become pandoc 3 at some point, supports highlighted text out of the box when called with

    pandoc --from=markdown+mark ...
    

    E.g.,

    echo '==Hi Mom!==' | pandoc -f markdown+mark -t latex
    ⇒ \hl{Hi Mom!}