Validate HTML with BeautifulSoup

I use BeautifulSoup 3.2.1 to parse a lot of HTML files translated with eTranslation.

I found soup = BeautifulSoup(html_file, "html.parser") sometimes cuts a section of my HTML file. And it is related to invalid tags or problems found in the HTML.

Also I found soup = BeautifulSoup(html_file, "lxml") works better in these cases of bad written HTML.

Is there a way to detect which HTML file is invalid using BeautifulSoup?

I image something like this:

if valid(html_file):
    soup = BeautifulSoup(html_file, "html.parser")
else:
    soup = BeautifulSoup(html_file, "lxml")

Solution

Here is what I did. Since BeautifulSoup fixes invalid HTML when parsing comparing it to the original gives an answer if it was valid.

from bs4 import BeautifulSoup


def is_valid_HTML_tag(html_string_to_check: str) -> bool:
    soup = BeautifulSoup(html_string_to_check, 'html.parser')
    return html_string_to_check == str(soup)

print(is_valid_HTML_tag('<div>valid</div>'))
print(is_valid_HTML_tag('<div>invalid'))

gives

True  

False

respectively