Search code examples
luapandocquarto

Quarto/pandoc filter using meta variable


I'm trying to write a lua filter for Quarto/pandoc that removes all code blocks that do not match the target languages as defined in the yaml header of the document. This is the filter I got so far:

taget_lang = nil

function Meta(m)
  if m.taget_lang then
    taget_lang = pandoc.utils.stringify(m.taget_lang) 
  end
  print("In Meta, taget lang is " .. taget_lang)
  return m
end

function CodeBlock(el)
  print("In CodeBlock, taget lang is " .. taget_lang)
  if taget_lang then
    if el.attr.classes[1] ~= taget_lang then
      return {}
    end
    return el
  end
end

And this is an example markdown (or rather Quarto) document:

    ---
    title: Some title
    author: Some author
    date: last-modified
    format:
      ipynb: 
        toc: false
        filters: 
          - langsplit.lua
    taget_lang: "python"
    ---
    
    Here is some text.
    
    ```{python test-py}
    print("some python code")
    ```
    
    ```{r test-r}
    print("some R code")
    ```

When I use quarto render test.qmd, I get this print output:

nil
nil
nil
nil
In Meta, taget lang is python

And the rendered document contains all code, telling me that the CodeBlock function has no access to the taget_lang defined inside Meta. But this should work, based on the documentation. Any clues?

(I'm also unhappy with return {}, which returns an empty code block instead of nothing, but that's a separate issue)


Solution

  • The docs specify that metadata is process after blocks have been filtered, so target_lang is set only after the CodeBlock elements have been processed.

    There are two ways to deal with this. One method is to filter the main Pandoc element, which gives more control:

    function Pandoc (doc)
      local target_lang = doc.meta.target_lang
      return doc:walk {
        CodeBlock = function (cb)
          if cb.classes[1] ~= target_lang then
            return {} -- delete block
          end
        end
      }
    end
    

    The alternative is to control the execution order of the filters by explicitly returning a sequence of filters, like so:

    return {
      { Meta = Meta},
      { CodeBlock = CodeBlock },
    }