Search code examples
pythonhtml-parser

Python overidable functions from HTMLParser


I understand how to use handle_starttag from HTMLParser but I am very confused how it works under the hood.

https://docs.python.org/3/library/html.parser.html#example-html-parser-application

The doc says one needs to override this handle_starttag method and it indeed works as expected.

However when I check the definition in the parent class (HTMLParser), the definition is nothing but a "pass".

So how does handle_starttag work? How does Python know tag is tag while attrs are attributes if the parent definition is empty? Happy to clarify more if my question is not clear. thanks in advance.


Solution

  • By default, handle_starttag doesn't do anything. It's only there to be overridden. Knowing what's a tag and what's an attribute isn't handle_starttag's job; it's the job of other code. Nothing is the default handle_starttag's job.

    Calling handle_starttag is HTMLParser's way of asking subclasses, "hey, do you want to do anything with this start tag I just parsed"? An overridden handle_starttag is a subclass's way of responding "yeah, thanks, I'll do the thing I do with start tags". If it's not overridden, it does nothing. Either way, after it's called, parsing just goes on.