Search code examples
htmlmathjaxpandoc

Convert html mathjax to markdown with pandoc


I have some html files including mathjax commands. I would like to translate it into php extra markdown using pandoc.

The problem is that pandoc add "\" before all math commands. For example \begin{equation} \$ x\^2 etc

Do you know how to avoid that with pandoc ? I think a related question is this one : How to convert HTML with mathjax into latex using pandoc?


Solution

  • You can write a short Haskell program unescape.hs:

    -- Disable backslash escaping of special characters when writing strings to markdown.
    import Text.Pandoc
    
    main = toJsonFilter unescape
      where unescape (Str xs) = RawInline "markdown" xs
            unescape x        = x
    

    Now compile with ghc --make unescape.hs. And use with

    pandoc -f html -t json | ./unescape | pandoc -f json -t markdown
    

    This will disable escaping of special characters (like $) in markdown output.

    A simpler approach might be to pipe pandoc's normal markdown output through sed:

    pandoc -f html -t markdown | sed -e 's/\\\([$^_*]\)/\1/g'