Search code examples
pythonhtml-parsing

Validating if a string is a valid HTML in python?


What is the best technique to be used in-order to find out that a string contains a valid html with correct syntax?

I tried looking into HTMLParser from module html.parser and if it doesn't produce any error during parsing, I conclude that the string is a valid HTML . However it didn't help me as it was even parsing invalid strings without raising any errors.

from html.parser import HTMLParser

parser = HTMLParser()

parser.feed('<h1> hi')
parser.close()

I expected it to throw some exception or error since the closing tag is missing but it didn't.


Solution

  •     from bs4 import BeautifulSoup
        st = """<html>
        ... <head><title>I'm title</title></head>
        ... </html>"""
        st1="who are you"
        bool(BeautifulSoup(st, "html.parser").find())
        True
        bool(BeautifulSoup(st1, "html.parser").find())
        False