Search code examples
pythonparsingstack-overflowmarkdownmarkup

Reducing capabilities of markdown in python


I'm writing a comment system. It has to be have formatting system like stackoverflow's.

Users can use some inline markdown syntax like bold or italic. I thought that i can solve that need with using regex replacements.

But there is another thing i have to do: by giving 4 space indents users can create code blocks. I think that i can't do this by using regex. or parsing idents is too advanced usage for me :) Also, creating lists via using regex replacements looks like imposible for me.

  • What would be best approach for doing this?
  • Are there any markdown libraries that can i reduce capabilities of it? (for example i'll try to remove tables support)
  • If i should write my own parser, should i write a finite state machine from the scratch? or are there any other libraries to make it easier?

Thanks for giving time, and your responses.


Solution

  • I'd just go ahead and use python-markdown and monkey-patch it. You can write your own def_block_parser() function and substitute that in for the default one to disable some of the Markdown functionality:

    from markdown import blockprocessors as bp
    def build_block_parser(md_instance, **kwargs):
        """ Build the default block parser used by Markdown. """
        parser = bp.BlockParser(md_instance)
        parser.blockprocessors['empty'] = bp.EmptyBlockProcessor(parser)
        parser.blockprocessors['indent'] = bp.ListIndentProcessor(parser)
        # parser.blockprocessors['code'] = bp.CodeBlockProcessor(parser)
        parser.blockprocessors['hashheader'] = bp.HashHeaderProcessor(parser)
        parser.blockprocessors['setextheader'] = bp.SetextHeaderProcessor(parser)
        parser.blockprocessors['hr'] = bp.HRProcessor(parser)
        parser.blockprocessors['olist'] = bp.OListProcessor(parser)
        parser.blockprocessors['ulist'] = bp.UListProcessor(parser)
        parser.blockprocessors['quote'] = bp.BlockQuoteProcessor(parser)
        parser.blockprocessors['paragraph'] = bp.ParagraphProcessor(parser)
        return parser
    bp.build_block_parser = build_block_parser
    

    Note that I've simply copied and pasted the default build_block_processor() function from the blockprocessors.py file, tweaked it a bit (inserting bp. in front of all the names from that module), and commented out the line where it adds the code block processor. The resulting function is then monkey-patched back into the module. A similar method looks feasible for inlinepatterns.py, treeprocessor.py, preprocessor.py, and postprocessor.py, each of which does a different kind of processing.

    Rather than rewriting the function that sets up the individual parsers, as I've done above, you could also patch out the parser classes themselves with do-nothing subclasses that would still be invoked but which would do nothing. That is probably simpler:

    from markdown import blockprocessors as bp
    class NoProcessing(bp.BlockProcessor):
        def test(self, parent, block):
            return False   # never invoke this processor
    
    bp.CodeBlockProcessor = NoProcessing
    

    There might be other Markdown libraries that more explicitly allow functionality to be disabled, but python-markdown looks like it is reasonably hackable.