Search code examples
pandoc

How to combine some markdown (with yaml header) files into a single pandoc output


I want to combine a few markdown files (0*.md, i.e. 02.md, 03.md and 05.md) with a file containing a title and abstract (index.md) files into a single pandoc output. All the files include headers like:

---
title: title02 
abstract: abstract02
---

followed by text.

The result is produced with

pandoc -o readme.pdf index.md 0*.md 

and is nearly satisfactory. The title and the abstract is the title of the last 05.md file (not of the index.md file). The title of the 0*.md files however are not included.

Is there are possibility to achieve the desired output with a pandoc command or do I need to revert to a program to combine the files? What can I change to improve the result?

I see some similarity to advice but the process is too involved; I need a procedure which runs automatically on different sets of similarly structured files.

enter image description here


Solution

  • Pandoc has a useful feature for cases like this: custom Lua readers. We can use that to use the input files just the way we need it.

    The script below assumes that the first input file (index.md) is special, in that it defines the top-level metadata. All other input files are slightly modified: the title is converted into a top-level heading, and all other headings are shifted to accommodate for that. Likewise, an abstract is added as a separate section. You can customize this further if needed.

    Use the below by saving it to a file combine.lua, then call pandoc with pandoc --from=combine.lua index.md 01.md ....

    function Reader(inputs, opts)
      local doc = pandoc.Pandoc{}  -- the resulting document
    
      -- parse input as Markdown
      local parse = function (input)
        return pandoc.read(tostring(input), 'markdown', opts)
      end
    
      -- The first file is assumed to be special. Just use as-is.
      doc = doc .. parse(inputs:remove(1))
    
      -- Process each input file separately and merge it into the top-level
      -- document.
      for i, input in ipairs(inputs) do
        local part = parse(input, opts)
        -- add the title as a top-level heading
        if part.meta.title then
          doc.blocks:insert(pandoc.Header(1, part.meta.title))
          part.meta.title = nil  -- unset, so it won't conflict with main title
        end
        -- add the abstract under a new heading
        if part.meta.abstract then
          doc.blocks:insert(pandoc.Header(2, 'Abstract'))
          doc.blocks:extend(
            pandoc.utils.type(part.meta.abstract) == 'Inlines' and
            {pandoc.Plain(part.meta.abstract)} or
            part.meta.abstract
          )
          part.meta.abstract = nil  -- prevent conflicts
        end
        -- append the main contents to the result doc and merge all meta
        -- information. Shift headings in the part.
        doc = doc .. part:walk {
          Header = function (h)
            h.level = h.level + 1
            return h
          end
        }
      end
      return doc
    end