Search code examples
yamlpandoc

Read YAML metadata from a Pandoc markdown file


Is it possible to extract Pandoc's metadata (title, date, et al.) from a markdown file without a Haskell filter, or parsing the --to=json output?

The JSON output is particularly inconvenient for this, since a two-word title looks like:

$ pandoc -t json posts/test.md | jq '.meta | .title'
{
  "t": "MetaInlines",
  "c": [
    {
      "t": "Str",
      "c": "Test"
    },
    {
      "t": "Space"
    },
    {
      "t": "Str",
      "c": "post"
    }
  ]
}

so even after having jq read the title, we still need to reconstruct words, and any emphasis, code, or anything else is only going to make it more complicated.


Solution

  • We can use the template variable $meta-json$ for this.

    Stick the variable in a file (with an extension, to stop Pandoc looking in it's own directories) and then use it with pandoc --template=file.ext.

    Pandoc's output is a JSON object with keys "title", "date", "tags", etc. and their respective values from the markdown document, which we can easily parse, filter, and manipulate with jq.

    $ echo '$meta-json$' > /tmp/metadata.pandoc-tpl
    $ pandoc --template=/tmp/metadata.pandoc-tpl | jq '.title,.tags'
    "The Title"
    [
      "a tag",
      "another tag"
    ]