Search code examples
pythonmarkdownpython-markdown

Do not render image from unlisted hosts in python markdown


I use Python-Markdown to render user generated content. I'd like to change pictures from external sources to links.

So i have a list of storages:

storages = ['foo.com', 'bar.net']

and i need to replace

![](http://external.com/image.png)

to something like:

[http://external.com/image.png](http://external.com/image.png)

if host not in storages.

I tried to edit markdown-text before saving to database but it's not good solution as user may want to edit his data and discover data was modified. So i want to do that replacement on render.


Solution

  • One solution to your question is demonstrated in this tutorial:

    from markdown.treeprocessors import Treeprocessor
    from markdown.extensions import Extension
    from urllib.parse import urlparse
    
    
    class InlineImageProcessor(Treeprocessor):
        def __init__(self, md, hosts):
            self.md = md
            self.hosts = hosts
    
        def is_unknown_host(self, url):
            url = urlparse(url)
            return url.netloc and url.netloc not in self.hosts
    
        def run(self, root):
            for element in root.iter('img'):
                attrib = element.attrib
                if self.is_unknown_host(attrib['src']):
                    tail = element.tail
                    element.clear()
                    element.tag = 'a'
                    element.set('href', attrib.pop('src'))
                    element.text = attrib.pop('alt')
                    element.tail = tail
                    for k, v in attrib.items():
                        element.set(k, v)
    
    
    class ImageExtension(Extension):
        def __init__(self, **kwargs):
            self.config = {'hosts' : [[], 'List of approved hosts']}
            super(ImageExtension, self).__init__(**kwargs)
    
        def extendMarkdown(self, md):
            md.treeprocessors.register(
                InlineImageProcessor(md, hosts=self.getConfig('hosts')),
               'inlineimageprocessor',
               15
            )
    

    Testing it out:

    >>> import markdown
    >>> from image-extension import ImageExtension
    >>> input = """
    ... ![a local image](/path/to/image.jpg)
    ... 
    ... ![a remote image](http://example.com/image.jpg)
    ... 
    ... ![an excluded remote image](http://exclude.com/image.jpg)
    ... """
    >>> print(markdown.markdown(input, extensions=[ImageExtension(hosts=['example.com'])]))
    <p><img alt="a local image" src="/path/to/image.jpg"/></p>
    <p><img alt="a remote image" src="http://example.com/image.jpg"/></p>
    <p><a href="http://exclude.com/image.jpg">an excluded remote image</a></p>
    

    Full disclosure: I am the lead developer of Python-Markdown. We needed another tutorial which demonstrated some additional features of the extension API. I saw this question and thought it would make a good candidate. Therefore, I wrote up the tutorial, which steps through the development process to end up with the result above. Thank you for the inspiration.