Search code examples
pythonhtmlsecuritycode-injectionsanitization

Python: Safely render user entered html code


In my Python/Flask application, I would like to safely accept user input and then render it on another page. Something similar to what is done on this website (ref - https://meta.stackexchange.com/questions/1777/what-html-tags-are-allowed-on-stack-exchange-sites).

Is there a python library to properly sanitize such input, or else is there some simple way to do it?


Solution

  • Take a look at bleach by Mozilla.

    Example

    import bleach
    html = """
    <h1> Page Title </h1>
    <script> alert("Boom!")</script>
    """
    allowed_tags = [
        'a', 'abbr', 'acronym', 'b', 'blockquote', 'br',
        'code', 'dd', 'del', 'div', 'dl', 'dt', 'em',
        'em', 'h1', 'h2', 'h3', 'hr', 'i', 'img', 'li',
        'ol', 'p', 'pre', 's', 'strong', 'sub', 'sup',
        'table', 'tbody', 'td', 'th', 'thead', 'tr', 'ul'
    ]
    # Attributes deemed safe
    allowed_attrs = {
        '*': ['class'],
        'a': ['href', 'rel'],
        'img': ['src', 'alt']
    }
    # Sanitize the html using bleach &
    # Convert text links to actual links
    html_sanitized = bleach.clean(
        html,
        tags=allowed_tags,
        attributes=allowed_attrs
    )
    print(html_sanitized)
    

    Output

    <h1> Page Title </h1>
    &lt;script&gt; alert("Boom!")&lt;/script&gt;
    

    I have used it in an example app for my flask extension (Flask-MDE). Feel free to check that out here.