Converting HTML with equations pages to docx

I am trying to convert an html document to docx using pandoc.

pandoc -s Template.html --mathjax -o Test.docx

During the conversion to docx everything goes smooth less the equations. In the html file the equation look like this:

<div class="jp-Cell jp-MarkdownCell jp-Notebook-cell">
<div class="jp-Cell-inputWrapper">
<div class="jp-Collapser jp-InputCollapser jp-Cell-inputCollapser">
</div>
<div class="jp-InputArea jp-Cell-inputArea"><div class="jp-RenderedHTMLCommon jp-RenderedMarkdown jp-MarkdownOutput " data-mime-type="text/markdown">
\begin{equation}
\log_{10}(\mu)={-2.64}+\frac{4437.038}{T-544.391}
\end{equation}
</div>
</div>
</div>
</div>

After running the pandoc command the result in the docx document is:

\begin{equation} \log_{10}(\mu)={-2.64}+\frac{4437.038}{T-544.391} \end{equation}

Do you have idea how can I overcome this issue?

Solution

A Lua filter can help here. The code below looks for div elements with a data-mime-type="text/markdown" attribute and, somewhat paradoxically, parses it context as LaTeX. The original div is then replaced with the parse result.

local stringify = pandoc.utils.stringify
function Div (div)
  if div.attributes['mime-type'] == 'text/markdown' then
    return pandoc.read(stringify(div), 'latex').blocks
  end
end

Save the code to a file parse-math.lua and let pandoc use it with the --lua-filter / -L option:

pandoc --lua-filter parse-math.lua ...

As noted in a comment, this gets slightly more complicated if there are other HTML elements with the text/markdown media type. In that case we'll check if the parse result contains only math, and keep the original content otherwise.

local stringify = pandoc.utils.stringify
function Div (div)
  if div.attributes['mime-type'] == 'text/markdown' then
    local result = pandoc.read(stringify(div), 'latex').blocks
    local first = result[1] and result[1].content or {}
    return (#first == 1 and first[1].t == 'Math')
      and result
      or nil
  end
end