Search code examples
pythoninitializerhtml-parser

Subclass _init_ method ignored - execution jumps straight to superclass _init_


I am using HTMLParser to parse some basic, well-formed HTML and for various reasons don't want to use BeautifulSoup. I subclassed HTMLParser and the actual parser works fine. However, the init_ method of the subclass is not being called. Instead, when I create a new subclass object, the init method of HTMLParser is called directly, and the subclass init is never called at all. This happens when I inherit from HTMLParser.HTMLParser as well as from urllib.HTMLParser. Here's the code:

class MyHtmlParser(htmllib.HTMLParser):

    def _init_(self, formatter):
        print("in init")
        htmllib.HTMLParser.__init__(self, formatter)        
        self.links = []
        self.is_li = False
        self.close_a = False
        self.close_li = False
        print "initialized"


    def get_links(self):
        return self.links

    def handle_starttag(self, tag, attrs):
        #some functionality here - this works

    def handle_endtag(self, tag):
        #some functionality here - this works

myparser = MyHtmlParser(formatter.NullFormatter)

Solution

  • It appears that you are missing two underscores in your function definition. The function should be:

    def __init__(self, formatter):