Search code examples
htmlmarkdownpandoc

Formatting divs using Pandoc


I am using Pandoc to convert Pandoc Markdown documents to HTML5 documents. In my md input, I write custom divs using a special Pandoc syntax, for example :

::: Resources
A nice document
Informative website
:::

The resulted HTML is this :

<div class="Resources">
    <p>A nice document Informative website</p>
</div>

I would like the output to be something like this instead :

<div class="Resources">
    <div>A nice document</div>      
    <div>Informative website</div>
</div>

Ie. I want the two resources to be in two different containers. I did not find any solution to do that (it is possible that the pandoc filters can, but I don't quite understand how to write them).

Thank you very much for any kind of help. Cheers.


Solution

  • If the main goal is to have separate Resource blocks, I'd suggest to use a list inside the div:

    ::: Resources
    - A nice document
    - Informative website
    :::
    

    This will give

    <div class="Resources">
    <ul>
    <li>A nice document</li>
    <li>Informative website</li>
    </ul>
    </div>
    

    It's not what you want yet, but get's us half way there. It already marks all resources as separate blocks. This simplifies our task to refine the document structure further through filtering. The following uses pandoc's Lua filter functionality; put the code into a file and pass it to pandoc via the --lua-filter command line parameter.

    local list_to_resources = {
      BulletList = function (el)
        local resources = {}
        local resource_attr = pandoc.Attr('', {'Resource'}, {})
        for i, item in ipairs(el.content) do
          resources[i] = pandoc.Div(item, resource_attr)
        end
        return resources
      end
    }
    
    function Div (el)
      -- return div unaltered unless it is of class "Resources"
      if not el.classes:includes'Resources' then
        return nil
      end
      return pandoc.walk_block(el, list_to_resources)
    end
    

    Calling pandoc with this filter will produce your desired output:

    <div class="Resources">
    <div class="Resource">
    A nice document
    </div>
    <div class="Resource">
    Informative website
    </div>
    </div>
    

    For the sake of completeness, I'll also add a solution to the question when taking it literally. However, I do not recommend using it for various reasons:

    1. It is far less "markdowny". Using only linebreaks to separate items is uncommon in Markdown and goes against its philosophy of having readable text without surprises.
    2. The necessary code is more complex and fragile.
    3. You won't be able to add additional information to the Resources div, as it will always be mangeled-up by the filter. With the previous solution, only bullet lists have a special meaning.

    That being said, here's the code:

    -- table to collect elements in a line
    local elements_in_line = {}
    
    -- produce a span from the collected elements
    local function line_as_span()
      local span = pandoc.Span(elements_in_line)
      elements_in_line = {}
      return span
    end
    
    local lines_to_blocks = {
      Inline = function (el)
        print(el.t)
        if el.t == 'SoftBreak' then
          return line_as_span()
        end
        table.insert(elements_in_line, el)
        return {}
      end,
    
      Para = function (el)
        local resources = {}
        local content = el.content
        -- last line is not followed by SoftBreak, add it here
        table.insert(content, line_as_span())
        local attr = pandoc.Attr('', {'Resource'})
        for i, line in ipairs(content) do
          resources[i] = pandoc.Div(pandoc.Plain(line.content), attr)
        end
        return resources
      end
    }
    
    function Div (el)
      -- return div unaltered unless it is of class "Resources"
      if not el.classes:includes'Resources' then
        return nil
      end
      return pandoc.walk_block(el, lines_to_blocks)
    end