Here is my HTML file, which I want to convert to Markdown. Note the first line is a comment, which I want to preserve.
<!-- https://fs.blog/feynman-technique/ -->
<h1 class="entry-title entry-title-single">The Feynman Technique: Master the Art of Learning</h1>
<div class="entry-content entry-content-single">
<p>The Feynman Technique is the most effective method to unlock your potential and develop a deep understanding. </p>
<p><a href="https://fs.blog/intellectual-giants/richard-feynman/">Richard Feynman</a> was not only a Nobel laureate in Physics but also a master of demystifying complex topics. His key learning insight: complexity and jargon often mask a lack of understanding. </p>
<p>Feynman’s learning technique comprises four key steps:</p>
<ol>
<li>Select a concept to learn.</li>
<li>Teach it to a child.</li>
<li>Review and refine your understanding.</li>
<li>Organize your notes and revisit them regularly.</li>
</ol>
<p>...</p>
<div class="wp-block-image">
<figure class="aligncenter"><img fetchpriority="high" decoding="async" width="1920" height="1080" src="https://149664534.v2.pressablecdn.com/wp-content/uploads/2012/04/FeynmanTechnique.jpg" alt="" class="wp-image-43131" srcset="https://149664534.v2.pressablecdn.com/wp-content/uploads/2012/04/FeynmanTechnique-300x169.jpg 300w , https://149664534.v2.pressablecdn.com/wp-content/uploads/2012/04/FeynmanTechnique-768x432.jpg 768w , https://149664534.v2.pressablecdn.com/wp-content/uploads/2012/04/FeynmanTechnique-1024x576.jpg 1024w , https://149664534.v2.pressablecdn.com/wp-content/uploads/2012/04/FeynmanTechnique-1536x864.jpg 1536w , https://149664534.v2.pressablecdn.com/wp-content/uploads/2012/04/FeynmanTechnique.jpg 1920w " sizes="(max-width: 1920px) 100vw, 1920px" /></figure></div>
<figure class="wp-block-pullquote"><blockquote><p>The person who says he knows what he thinks but cannot express it usually does not know what he thinks.</p><cite>Mortimer Adler</cite></blockquote></figure>
<h2 class="wp-block-heading">Step 1: Select a concept to learn.</h2>
<p>...</p>
My current solution is
pandoc from.htm -o to.md -t gfm-raw_html --wrap=none
and it gives me really neat markup, without any garbage,
# The Feynman Technique: Master the Art of Learning
The Feynman Technique is the most effective method to unlock your potential and develop a deep understanding.
[Richard Feynman](https://fs.blog/intellectual-giants/richard-feynman/) was not only a Nobel laureate in Physics but also a master of demystifying complex topics. His key learning insight: complexity and jargon often mask a lack of understanding.
Feynman’s learning technique comprises four key steps:
1. Select a concept to learn.
2. Teach it to a child.
3. Review and refine your understanding.
4. Organize your notes and revisit them regularly.
...
![](https://149664534.v2.pressablecdn.com/wp-content/uploads/2012/04/FeynmanTechnique.jpg)
> The person who says he knows what he thinks but cannot express it usually does not know what he thinks.
>
> Mortimer Adler
## Step 1: Select a concept to learn.
...
but the problem is that it doesn't preserve HTML comments. Is there a way to fix this issue?
Your problem is that you're disabling raw_html with -t gfm-raw_html
. The following preserves raw HTML (including comments, which are represented just as raw HTML in the pandoc document AST):
pandoc -f html+raw_html -t gfm
Depending on what you want to achieve, it's possibly that you need to write a pandoc lua filter to remove the raw HTML snippets that are not comments. Something like the following (untested):
function RawInline(el)
return nil
end
function RawBlock(el)
if starts_with('<!--', el.text) then
return el
else
return nil
end
end
But try -t native
to inspect the document AST between the reader and writer.