I am trying to convert an html document to docx using pandoc.
pandoc -s Template.html --mathjax -o Test.docx
During the conversion to docx everything goes smooth less the equations. In the html file the equation look like this:
<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
</div>
<div class="jp-InputArea jp-Cell-inputArea"><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
\begin{equation}
\log_{10}(\mu)={-2.64}+\frac{4437.038}{T-544.391}
\end{equation}
</div>
</div>
</div>
</div>
After running the pandoc command the result in the docx document is:
\begin{equation} \log_{10}(\mu)={-2.64}+\frac{4437.038}{T-544.391} \end{equation}
Do you have idea how can I overcome this issue?
A Lua filter can help here. The code below looks for div elements with a data-mime-type="text/markdown"
attribute and, somewhat paradoxically, parses it context as LaTeX. The original div is then replaced with the parse result.
local stringify = pandoc.utils.stringify
function Div (div)
if div.attributes['mime-type'] == 'text/markdown' then
return pandoc.read(stringify(div), 'latex').blocks
end
end
Save the code to a file parse-math.lua
and let pandoc use it with the --lua-filter
/ -L
option:
pandoc --lua-filter parse-math.lua ...
As noted in a comment, this gets slightly more complicated if there are other HTML elements with the text/markdown
media type. In that case we'll check if the parse result contains only math, and keep the original content otherwise.
local stringify = pandoc.utils.stringify
function Div (div)
if div.attributes['mime-type'] == 'text/markdown' then
local result = pandoc.read(stringify(div), 'latex').blocks
local first = result[1] and result[1].content or {}
return (#first == 1 and first[1].t == 'Math')
and result
or nil
end
end