We want to replace the default h
tags, introduced by markdown using #
, with a custom HTML Tag. For Parsing Markdown to HTML we use the Python Library Markdown.
We have tried to register an extension that uses a H1 regex. This extension uses the regexp (#) (.*)
for detecting H1 Elements.
import markdown
from markdown.extensions import Extension
from markdown.inlinepatterns import SimpleTagPattern
class CustomHeadings(Extension):
def extendMarkdown(self, md, md_globals):
H1_RE = r'(#) (.*)'
h1_tag = SimpleTagPattern(H1_RE, 'span class="h1"')
md.inlinePatterns['h1'] = h1_tag
md_extensions = [CustomHeadings()]
# [...]
def ds_custom_markdown_parse(value):
return markdown.markdown(value, extensions=md_extensions)
We want to have h{1-6}
elements as a span class="h{1-6}"
. But the Markdown parser still matches the string # This is a h1
to <h1>This is a h1</h1>
. We expect the output to be <span class="h1">This is a h1</span>
Headings are block-level elements and therefore are not parsed by inlinePatterns.
Prior to running the inlinePatterns
, Python-Markdown runs the BlockParser, which converts all of the block-level elements of the document into an ElementTree object. Each block-level element is then passed through the inlinePatterns
one at a time and the the span-level elements are parsed.
For example, given your heading # This is a h1
, the BlockParser has already converted it to an H tag <h1>This is a h1</h1>
and the inlinePatterns only see the text content of that tag This is a h1
.
You have a few options for addressing this:
BlockProcessor
s which parse headings so that they create the elements you desire from the get-go.Option 2 should be much simpler and is, in fact, the method used by a few existing extensions.
Full discloser: I am the lead developer of the Python-Markdown project.