Search code examples
pythonmarkdownpython-markdown

Replace Markdown heading tags with custom in Python Markdown


We want to replace the default h tags, introduced by markdown using #, with a custom HTML Tag. For Parsing Markdown to HTML we use the Python Library Markdown.

We have tried to register an extension that uses a H1 regex. This extension uses the regexp (#) (.*) for detecting H1 Elements.

import markdown
from markdown.extensions import Extension
from markdown.inlinepatterns import SimpleTagPattern

class CustomHeadings(Extension):
    def extendMarkdown(self, md, md_globals):
        H1_RE = r'(#) (.*)'

        h1_tag = SimpleTagPattern(H1_RE, 'span class="h1"')
        md.inlinePatterns['h1'] = h1_tag

md_extensions = [CustomHeadings()]

# [...]

def ds_custom_markdown_parse(value):
    return markdown.markdown(value, extensions=md_extensions)

We want to have h{1-6} elements as a span class="h{1-6}". But the Markdown parser still matches the string # This is a h1 to <h1>This is a h1</h1>. We expect the output to be <span class="h1">This is a h1</span>


Solution

  • Headings are block-level elements and therefore are not parsed by inlinePatterns. Prior to running the inlinePatterns, Python-Markdown runs the BlockParser, which converts all of the block-level elements of the document into an ElementTree object. Each block-level element is then passed through the inlinePatterns one at a time and the the span-level elements are parsed.

    For example, given your heading # This is a h1, the BlockParser has already converted it to an H tag <h1>This is a h1</h1> and the inlinePatterns only see the text content of that tag This is a h1.

    You have a few options for addressing this:

    1. You could override the BlockProcessors which parse headings so that they create the elements you desire from the get-go.
    2. Or you could leave the existing block parser in place and create a TreeProcessor which steps through the completed ElementTree object and alters the elements by redefining the tag names in the relevant elements.

    Option 2 should be much simpler and is, in fact, the method used by a few existing extensions.

    Full discloser: I am the lead developer of the Python-Markdown project.