python flask beautifulsoup mechanize-python

Can print but not return html table: "TypeError: ResultSet object is not an iterator"

Python newbie here. Python 2.7 with beautifulsoup 3.2.1.

I'm trying to scrape a table from a simple page. I can easily get it to print, but I can't get it to return to my view function.

The following works:

@app.route('/process')
def process():

   queryURL = 'http://example.com'

   br.open(queryURL)

   html = br.response().read()
   soup = BeautifulSoup(html)
   table = soup.find("table")
   print table

   return 'All good'

I can also return html successfully. But when I try to return table instead of return 'All good' I get the following error:

TypeError: ResultSet object is not an iterator

I also tried:

br.open(queryURL)

html = br.response().read()
soup = BeautifulSoup(html)
table = soup.find("table")
out = []
for row in table.findAll('tr'):
    colvals = [col.text for col in row.findAll('td')]
    out.append('\t'.join(colvals))

return table

With no success. Any suggestions?

Solution

You're trying to return an object, you're not actually getting the text of the object so return table.text should be what you are looking for. Full modified code:

def process():

   queryURL = 'http://example.com'

   br.open(queryURL)

   html = br.response().read()
   soup = BeautifulSoup(html)
   table = soup.find("table")
   return table.text

EDIT:

Since I understand now that you want the HTML code that forms the site instead of the values, you can do something like this example I made:

import urllib

url = urllib.urlopen('http://www.xpn.org/events/concert-calendar')
htmldata = url.readlines()
url.close()

for tag in htmldata:
    if '<th' in tag:
        print tag
    if '<tr' in tag:
        print tag
    if '<thead' in tag:
        print tag
    if '<tbody' in tag:
        print tag
    if '<td' in tag:
        print tag

You can't do this with BeautifulSoup (at least not to my knowledge) is because BeautifulSoup is more for parsing or printing the HTML in a nice looking manner. You can just do what I did and have a for loop go through the HTML code and if a tag is in the line, then print it.

If you want to store the output in a list to use later, you would do something like:

htmlCodeList = []
for tag in htmldata:
        if '<th' in tag:
            htmlCodeList.append(tag)
        if '<tr' in tag:
            htmlCodeList.append(tag)
        if '<thead' in tag:
            htmlCodeList.append(tag)
        if '<tbody' in tag:
            htmlCodeList.append(tag)
        if '<td' in tag:
            htmlCodeList.append(tag)

This save the HTML line in a new element of the list. so <td> would be index 0 the next set of tags would be index 1, etc.