I try to create a Lua filter to preserve HTML comments (but not any other HTML elements).
local function starts_with(start, str)
return str:sub(1, #start) == start
end
function RawInline(el)
if starts_with('<!--', el.text) then
return el
else
return nil
end
end
return {{Inline = RawInline}}
(Based on mb21's answer here: From HTML to Markdwon: As clean Markdown markup as possible, and to preserve HTML comments.)
It doesn't currently work. What might be the problem?
pandoc -f html+raw_html from.html -o to.md -t gfm --lua-filter preserve-comments.lua
There are two small problems that prevent this filter from working. I'm listing them below and include explanations and solutions for each.
The main issue is return {{Inline = RawInline}}
. This causes the RawInline
function to be called for all Inline elements, such as Str, Emph, Space, etc. This is causing issues, because some elements don't have a .text
attribute, and calling starts_with
with nil
as the second argument triggers an error.
The solution for this is to either use return {{RawInline = RawInline}}
, or to leave the line out entirely. Both solutions are equivalent due to the way pandoc constructs filters from global functions if no explicit filter table is returned.
The RawInline
function does nothing, because return el
and return nil
do the same thing in this case. Not returning anything from a filter function causes pandoc to keep the object unaltered. Deleting an object is possible by returning {}
.
To summarize, this should work:
local function starts_with(start, str)
return str:sub(1, #start) == start
end
function RawInline(el)
if not starts_with('<!--', el.text) then
return {}
end
end
To make ensure that no HTML at all is included in the output, we can use gfm-raw_html
as the output format, i.e., we disable the raw_html
extension. This will also suppress any HTML comment, so we modify the filter to pretend that these comments are raw Markdown, which will be included verbatim.
local function starts_with(start, str)
return str:sub(1, #start) == start
end
function RawInline (el)
return starts_with('<!--', el.text)
and pandoc.RawInline('markdown', el.text) -- pretend it's md
or {} -- not an HTML comment, thus drop it
end