Search code examples
javascriptmarkdownabstract-syntax-tree

How can I parse Markdown into an AST, manipulate it, and write it back to Markdown?


I want to modify Markdown files programmatically.

I have been looking into Markdown parsers and tried a few of them; namely Marked, Markdown-it and Commonmark. They give access to an AST, which allows me to modify the content easily.

The problem is that they render to HTML only. I couldn't find any info on rendering back to Markdown.

I see two options right now, either write a custom renderer for one of these libraries (which would be quite time consuming) or use a separate tool that transforms HTML back to Markdown.

Is there an easier alternative? And why would a Markdown parser only render to HTML?


Solution

  • The best alternative is what you wanted to do in the first place!

    There are many Markdown parsers that produce ASTs, and a good number of those can render it back to Markdown!

    And why would a Markdown parser only render to HTML?

    The reason a lot of them do is because the number one use of Markdown is as source code for HTML. Markdown was even designed for that in the first place. So the most common use of a Markdown parser, including cases where people want to first manipulate the AST, is to output HTML.

    That said, the really good libraries include the option to render to other formats, including back to Markdown.

    Here are the libraries that I already know can do this:

    Pandoc

    Probably the number one Markdown toolkit in the world. Pandoc's native language is Haskell, but there are Javascript wrappers (just search npm). If you're going to do a lot of Markdown stuff down the road, it probably makes sense to become knowledgable in Pandoc anyway.

    Its support for filters" is all about AST manipulation. It has special support for Lua and Lua filters, which might be the easiest to code, but you can also write filters in other languages: Python, PHP, Perl, Javascript/Typescript, Groovy, Ruby.

    It supports renderer to Markdown, amongst a huge number of other formats.

    Its parser and renderer has many other options that might make your job even easier, or maybe already do exactly what you want. There are also many filters people have written that may already do what you want.

    CMark

    Though this reference implementation of CommonMark is written in C, there are many Node wrappers. There is even a port to JavaScript using Emscripten. It ports the GitHub extensions, so that tables and other GFM things can also be manipulated in the AST.

    It can output CommonMark, as well as HTML and LaTeX, or even an XML representation of the AST.

    remark

    A Javascript-based framework specifically designed around AST manipulation. I've never used it, but it possibly has tools to make AST manipulation easier, though I'm only guessing.