Search code examples
pythonhtmlhtml-sanitizing

Python html-sanitizer allow img tag


Hello guys i am using the html-sanitizer python package but im unable to enable img tags as it is disabled by default

i tried edited the sanitizer.py(shown below) in site-packages but still no luck.

DEFAULT_SETTINGS = {
    "tags": {
        "a",
        "h1",
        "h2",
        "h3",
        "strong",
        "em",
        "p",
        "ul",
        "ol",
        "li",
        "br",
        "sub",
        "sup",
        "hr",
        "img"
    },
    "attributes": {"a": ("href", "name", "target", "title", "id", "rel"),"img": ("src")},
    "empty": {"hr", "a", "br"},
    "separate": {"a", "p", "li"},
    "whitespace": {"br"},
    "add_nofollow": False,
    "autolink": False,
    "sanitize_href": sanitize_href,
    "element_preprocessors": [
        # convert span elements into em/strong if a matching style rule
        # has been found. strong has precedence, strong & em at the same
        # time is not supported
        bold_span_to_strong,
        italic_span_to_em,
        tag_replacer("b", "strong"),
        tag_replacer("i", "em"),
        tag_replacer("form", "p"),
        target_blank_noopener,
    ],
    "element_postprocessors": [],
}

Can somebody help me out. i want the img tag with only src attribute


Solution

  • Sanitizer won't use DEFAULT_SETTINGS if different settings are provided when initializing the Sanitizer() on the settings={} arguments. That might be going on here, but I suspect it's the empty attribute that's wrong.

    sanitizer will also remove tags which are empty so, for example <em></em> is cleaned to ''. That's nice, but the <img .../> also results in an empty tag (that is, no children), so sanitizer cleans it.

    You need to add img to the settings['empty'] set, along with the current {"hr", "a", "br"}.

    While you're at it, don't edit DEFAULT, but instead define your own (using copy of DEFAULT). For example:

    # Make a copy
    my_settings = dict(html_sanitizer.sanitizer.DEFAULT_SETTINGS)
    
    # Add your changes
    my_settings['tags'].add('img')
    my_settings['empty'].add('img')
    my_settings['attributes'].update({'img': ('src', )})
    
    # Use it
    s = html_sanitizer.Sanitizer(settings=my_settings)
    s.sanitize('<em><img src="/index.html"/></em>')