Search code examples
javascriptjqueryhtmlmarkdownpagedown

Pagedown and Angle Brackets not Cooperating


I'm trying to use Pagedown to parse markdown (containing code sections) into html.

It mostly works, but I've noticed one strange behavior. If I have this in my markdown:

`ArrayList<String> names = new ArrayList<>();`

The text that displays ends up being this:

ArrayList<string> names = new ArrayList&lt;&gt;();

Notice that the first String is lower-cased, and the second <> is converted into html entities, which are not correctly displayed since they end up inside a code block.

If I look at the markdown that Pagedown "thinks" it's supposed to process, it gets stranger:

`ArrayList<string> names = new ArrayList&lt;&gt;();`</string>

Obviously, it's treating the <String> section of the code text as an html tag, and adding a closing </string> tag. Parsing that markdown produces this html:

<code>ArrayList&lt;string&gt; names = new ArrayList&amp;lt;&amp;gt;();</code>

If I encode the angle brackets ahead of time:

`ArrayList&lt;String&gt; names = new ArrayList&lt;&gt;();`

Then Pagedown simply encodes the html entities as part of the code, which is exactly what I want it to do with the angle brackets:

 <code>ArrayList&amp;lt;String&amp;gt; names = new ArrayList&amp;lt;&amp;gt;();</code>

I just want to be able to throw markdown (containing code sections) into the Pagedown parser and have it output sanitized html. Here is what I'm currently doing:

<!DOCTYPE html>
<html>
<head>
<script type="text/javascript" src="Markdown.Converter.js"></script>
<script type="text/javascript" src="Markdown.Sanitizer.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script>
<script>
    function parseMarkdown(){

        var markdown = $("#markdown").html();        

        console.log("markdown: " + markdown);

        var converter =  new Markdown.getSanitizingConverter();
            var html = converter.makeHtml(markdown);

            console.log("html: " + html);

            $("#markdown").html(html);
    }
    $(parseMarkdown);
</script>
</head>
<body onload="parseMarkdown">
<div id="markdown">
`ArrayList<String> names = new ArrayList<>();`
</div>

</body>

</html>

In real life the markdown is either coming from a database (which was written using the Pagedown editor) or from markdown files (which were written in a basic text editor). Is there an extra step I'm missing? Is the above approach a risk of bad JavaScript being run before Markdown parses it?


Solution

  • The problem has nothing to do with Markdown or Pagedown.

    When you put content inside an HTML document, the HTML parser (the browser...) "corrects" invalid HTML fragments. In your case, it automatically adds a closing tag to make the HTML well formed. It apparently also converted "String" to lowercase "string" as part of its parsing process.

    As you described yourself, Markdown code should probably come from a different source (JSON, external resources wrapped in a script element, etc...) where you will not have this problem at all.