Search code examples
htmlmarkdownh2pandoc

Specify Pandoc HTML numbering to start from <h2>


I want to convert a markdown to HTML with header numbering, starting from <h2>. What's the way to achieve it?

pandoc provides the option --number-sections (or -N) so headers are numbered in the output. Now I am trying to convert markdown to HTML with this option.

In default, the output HTML header level of pandoc starts from <h1>. It is not ideal and so I want to change it to <h2> (whereas the original markdown may contain many first-level headers, the output HTML should contain at most 1 <h1>).

It is possible to specify --shift-heading-level-by=1; then, the output header level starts from <h2> (see Official Pandoc User's Guide and maybe also this question). However, it would mess up the section-numbering! Basically, the level of the section numbering shifts, too. Now all sections are under "0" (like 0.1, 0.2, 0.2.1, …) and no sections of 1 exist.

pandoc provides another option --number-offset=1 but what it does is just offseting the numbers like "0.1"→"1.1". Then, all section numbers start from 1 with no sections numbered 2. Obviously, it makes no sense. The initial prefix number "1." is redundant and should be removed from all the section numbers like 1.1→1, 1.1.4→1.4, 1.2.3→2.3, etc.

For demonstration purposes, here is a sample markdown text file (abc.md)

%Test-md

# First Header (1) #

## Header (1-1) ##

# Second Header (2) #

## Header (2-2) ##

### Header (2-3) ###

and its output HTML (simplified) with

pandoc -N --section-divs --shift-heading-level-by=1 -t html5 abc.md
<section id="first-header-1" data-number="0.1">
  <h2 data-number="0.1">0.1 First Header (1)</h2>
    <section id="header-1-1" data-number="0.1.1">
      <h3 data-number="0.1.1">0.1.1 Header (1-1)</h3>
    </section>
  </section>
  <section id="second-header-2" data-number="0.2">
    <h2 data-number="0.2">0.2 Second Header (2)</h2>
      <section id="header-2-2" data-number="0.2.1">
        <h3 data-number="0.2.1">0.2.1 Header (2-2)</h3>
        <section id="header-2-3" data-number="0.2.1.1">
          <h4 data-number="0.2.1.1">0.2.1.1 Header (2-3)</h4>
       </section>
  </section>
</section>

How can one make pandoc do the numbering in the ordinary way (1, 2, 2.1, 2.2, 2.2.1) yet output the HTML with the header level starting from <h2>?


Solution

  • Pandoc first shifts the headings, then does the numbering. This is not what we want here though, we'd like the numbering to happen first. A pandoc Lua filters can be used to take control of this.

    The function pandoc.utils.make_sections performs the action that's triggered by passing --section-divs or --number-sections on the command line. Matching the effect of --shift-heading-level-by=1 is possible by modifying all Header elements manually:

    function Pandoc (doc)
      -- Create and number sections. Setting the first parameter to
      -- `true` ensures that headings are numbered.
      doc.blocks = pandoc.utils.make_sections(true, nil, doc.blocks)
    
      -- Shift the heading levels by 1
      doc.blocks = doc.blocks:walk {
        Header = function (h)
          h.level = h.level + 1
          return h
        end
      }
    
      -- Return the modified document
      return doc
    end
    

    The filter would be used by saving it to a file shifted-numbered-headings.lua. It can then be passed to pandoc via the --lua-filter/-L parameter. The --number-sections/-N option must still be passed for the numbering to become visible, and --section-divs is still required to get <section> elements.

    pandoc \
        --lua-filter=shifted-numbered-headings.lua \
        --number-sections \
        --section-divs \
        ...
    

    The class that pandoc sets on the <section> elements will always reflect the actual tagging level: the <section> that wraps an <h2> heading will have class="level2", even if, conceptually, it is a first level heading. This may be confusing and, unfortunately, cannot be changed with a filter.