Search code examples
pythonhtml-parsing

Python HTMLParser printing out blank lines


I'm playing around with python's HTMLParser and having an issue with it printing out blank lines.

from HTMLParser import HTMLParser
import urllib2
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
     print "Encountered some data  :", data

# instantiate the parser and fed it some HTML
url = 'http://www.ngccoin.com/price-guide/us/flying-eagle-cents-pscid-16-desig-ms'
req = urllib2.Request(url, headers={'User-Agent' :"Magic Browser"})
response = urllib2.urlopen(req)
html = response.read()

parser = MyHTMLParser()
parser.feed( html )

My issue is when it hits a data section it prints out just new lines as well as actual data. MY output looks a lot like:

Encountered some data  :

Encountered some data  : Official Grading Service of
Encountered some data  :

Encountered some data  :

Encountered some data  :

How should I go about getting it to ignore those lines with just a new line?


Solution

  • Simply have it ignore those lines with just a new line:

    def handle_data(self, data):
        if data == '\n':
            return
        print "Encountered some data  :", data
    

    Or, have it ignore any data consisting of only whitespace:

    def handle_data(self, data):
        if not data.strip():
            return
        print "Encountered some data  :", data