I have a group of adoc documents that I'm converting to markdown. For most of them I've been able to convert them with:
asciidoc -b docbook -o temp.xml <infile>
pandoc -f docbook -t markdown_strict --atx-headers --mathjax temp.xml -o <outfile>
followed by some regex to clean up some broken image links and fix the headers. However, this doesn't work for the in-line math equations. In the adoc they are in the syntax: latexmath:[$some_equation_here$]
sometimes without the dollar signs for multi-line equations.
when this gets turned into the DocBook XML it seems to be preserved and is of the format:
<inlineequation>
<alt><![CDATA[$some_equation_here$]]></alt>
<inlinemediaobject><textobject><phrase></phrase></textobject></inlinemediaobject>
</inlineequation>
but when pandoc converts it back to markdown it ignores these blocks of xml. How can i keep it in a markdown readable equation ($some_equation_here$) format during the pandoc conversion? The mathjax
extension doesn't seem to be helping with this operation.
I tried to use a seperate python regex that would use re.sub(r'latexmath:\[\$?(.*?)\$?\]', r'$\g<1>$', file_contents
to keep the $ but it results in some double escaped text that then has to go be fixed manually as well as not fully working sometimes giving some extra /sup
tags. Trying to do something similar with the XML file resulted in similar results.
Looking at the pandoc code it seems that the DocBook reader expects the formula to be in an <mathphrase>
element below <inlineequation>
. Thus, replacing the <alt>
tags with <mathphrase>
is enough to get the equation to be picked up by pandoc. This yields invalid DocBook XML in general, as the <inlineequation>
should contain either a <mathphrase>
or <inlinemediaobjects>
, but that doesn't matter for pandoc.
cat << EOF | pandoc --from=docbook --to markdown --lua-filter=unwrap-math.lua
<para>
<inlineequation>
<mathphrase><![CDATA[$some_equation_here$]]></mathphrase>
<inlinemediaobject><textobject><phrase></phrase></textobject></inlinemediaobject>
</inlineequation>
</para>
EOF
$some_equation_here$
Note that pandoc inserts the dollars itself, so those should be removed as well. The above command uses a Lua filter to remove the dollars; unwrap-math.lua
contains
function Math (mth)
mth.text = mth.text:gsub('^%$', ''):gsub('%$$', '')
return mth
end