What is the best technique to be used in-order to find out that a string contains a valid html with correct syntax?
I tried looking into HTMLParser from module html.parser and if it doesn't produce any error during parsing, I conclude that the string is a valid HTML . However it didn't help me as it was even parsing invalid strings without raising any errors.
from html.parser import HTMLParser
parser = HTMLParser()
parser.feed('<h1> hi')
parser.close()
I expected it to throw some exception or error since the closing tag is missing but it didn't.
from bs4 import BeautifulSoup
st = """<html>
... <head><title>I'm title</title></head>
... </html>"""
st1="who are you"
bool(BeautifulSoup(st, "html.parser").find())
True
bool(BeautifulSoup(st1, "html.parser").find())
False