Search code examples
javaparsingmarkdowncommonmark

Commonmarks library for parsing markdown: how to parse three ticks


I have this code:

public class TestCommons {
    public static void main(String[] args) {

        Parser parser = Parser.builder().build();
        Node document = parser.parse("*`Yes.` This* **is** ```\nSparta```");
        HtmlRenderer renderer = HtmlRenderer.builder().build();
        System.out.println(renderer.render(document));  //<p><em><code>Yes.</code> This</em> <strong>is</strong> <code> Sparta</code></p>
        }
}

So as can be seen, it replaces both one-tick and three-ticks with <code> tag. However, I'd like it to replace three-ticks with <pre> tag. Is there any way to achieve this? I only found this example on their github:

Parser parser = Parser.builder().build();
HtmlRenderer renderer = HtmlRenderer.builder()
        .nodeRendererFactory(new HtmlNodeRendererFactory() {
            public NodeRenderer create(HtmlNodeRendererContext context) {
                return new IndentedCodeBlockNodeRenderer(context);
            }
        })
        .build();

Node document = parser.parse("*`Yes.` This* **is** ```\nSparta```");
renderer.render(document);
//<p><em><code>Yes.</code> This</em> <strong>is</strong> <code> Sparta</code></p>

class IndentedCodeBlockNodeRenderer implements NodeRenderer {

    private final HtmlWriter html;

    IndentedCodeBlockNodeRenderer(HtmlNodeRendererContext context) {
        this.html = context.getWriter();
    }

    @Override
    public Set<Class<? extends Node>> getNodeTypes() {
        // Return the node types we want to use this renderer for.
        return Collections.<Class<? extends Node>>singleton(IndentedCodeBlock.class);
    }

    @Override
    public void render(Node node) {
        // We only handle one type as per getNodeTypes, so we can just cast it here.
        IndentedCodeBlock codeBlock = (IndentedCodeBlock) node;
        html.line();
        html.tag("pre");
        html.text(codeBlock.getLiteral());
        html.tag("/pre");
        html.line();
    }
}

but it gives the same result and seems to only distinguish strings with multiple /n in front as qualifying for the <pre> tag.

What is the way to turn

"*`Yes.` This* **is** ```\nSparta```"

into

<p><em><code>Yes.</code> This</em> <strong>is</strong> <pre> Sparta</pre></p>

?


Solution

  • Note that there are two difference types of content which can be surrounded by 3 ticks:

    1. Code spans can be deliminated by any number of backticks, including three, so long as the opening and closing number match. Code spans are not block level and only get wrapped in <code> tags
    2. Fenced code blocks, which require three or more backticks and must be surrounded by linebreaks and the deliminaters must be on lines by themselves. Fenced code blocks are block level and get wrapped in <pre><code> tags.

    I presume you are referring to fenced code blocks. Note that they require more linebreaks that you have in your string. Therefore, your string is correctly being parsed as code spans. If you would like the code surrounded by 3 backticks to be recognized as a fenced code block, you need to add a few additional linebreaks:

    "*`Yes.` This* **is** \n```\nSparta\n```"
    

    Or, as a formatted string:

    *`Yes.` This* **is** 
    ```
    Sparta
    ```
    

    The above would get parsed by Commonmark to:

    <p><em><code>Yes.</code> This</em>  <strong>is</strong>
    </p>
    <pre><code>Sparta
    </code></pre>
    

    Note that if you remove any one of the line breaks, then it is then considered a code span rather than a fenced code block.