I'm playing around with python's HTMLParser and having an issue with it printing out blank lines.
from HTMLParser import HTMLParser
import urllib2
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
print "Encountered some data :", data
# instantiate the parser and fed it some HTML
url = 'http://www.ngccoin.com/price-guide/us/flying-eagle-cents-pscid-16-desig-ms'
req = urllib2.Request(url, headers={'User-Agent' :"Magic Browser"})
response = urllib2.urlopen(req)
html = response.read()
parser = MyHTMLParser()
parser.feed( html )
My issue is when it hits a data section it prints out just new lines as well as actual data. MY output looks a lot like:
Encountered some data :
Encountered some data : Official Grading Service of
Encountered some data :
Encountered some data :
Encountered some data :
How should I go about getting it to ignore those lines with just a new line?
Simply have it ignore those lines with just a new line:
def handle_data(self, data):
if data == '\n':
return
print "Encountered some data :", data
Or, have it ignore any data consisting of only whitespace:
def handle_data(self, data):
if not data.strip():
return
print "Encountered some data :", data