python html-parsing beautifulsoup lxml pyquery

What’s the most forgiving HTML parser in Python?

I have some random HTML and I used BeautifulSoup to parse it, but in most of the cases (>70%) it chokes. I tried using Beautiful soup 3.0.8 and 3.2.0 (there were some problems with 3.1.0 upwards), but the results are almost same.

I can recall several HTML parser options available in Python from the top of my head:

BeautifulSoup
lxml
pyquery

I intend to test all of these, but I wanted to know which one in your tests come as most forgiving and can even try to parse bad HTML.

Solution

I ended up using BeautifulSoup 4.0 with html5lib for parsing and is much more forgiving, with some modifications to my code it's now working considerabily well, thanks all for suggestions.