Search code examples
pythonflaskbeautifulsoupmechanize-python

Can print but not return html table: "TypeError: ResultSet object is not an iterator"


Python newbie here. Python 2.7 with beautifulsoup 3.2.1.

I'm trying to scrape a table from a simple page. I can easily get it to print, but I can't get it to return to my view function.

The following works:

@app.route('/process')
def process():

   queryURL = 'http://example.com'

   br.open(queryURL)

   html = br.response().read()
   soup = BeautifulSoup(html)
   table = soup.find("table")
   print table

   return 'All good'

I can also return html successfully. But when I try to return table instead of return 'All good' I get the following error:

TypeError: ResultSet object is not an iterator

I also tried:

br.open(queryURL)

html = br.response().read()
soup = BeautifulSoup(html)
table = soup.find("table")
out = []
for row in table.findAll('tr'):
    colvals = [col.text for col in row.findAll('td')]
    out.append('\t'.join(colvals))

return table

With no success. Any suggestions?


Solution

  • You're trying to return an object, you're not actually getting the text of the object so return table.text should be what you are looking for. Full modified code:

    def process():
    
       queryURL = 'http://example.com'
    
       br.open(queryURL)
    
       html = br.response().read()
       soup = BeautifulSoup(html)
       table = soup.find("table")
       return table.text
    

    EDIT:

    Since I understand now that you want the HTML code that forms the site instead of the values, you can do something like this example I made:

    import urllib
    
    url = urllib.urlopen('http://www.xpn.org/events/concert-calendar')
    htmldata = url.readlines()
    url.close()
    
    for tag in htmldata:
        if '<th' in tag:
            print tag
        if '<tr' in tag:
            print tag
        if '<thead' in tag:
            print tag
        if '<tbody' in tag:
            print tag
        if '<td' in tag:
            print tag
    

    You can't do this with BeautifulSoup (at least not to my knowledge) is because BeautifulSoup is more for parsing or printing the HTML in a nice looking manner. You can just do what I did and have a for loop go through the HTML code and if a tag is in the line, then print it.

    If you want to store the output in a list to use later, you would do something like:

    htmlCodeList = []
    for tag in htmldata:
            if '<th' in tag:
                htmlCodeList.append(tag)
            if '<tr' in tag:
                htmlCodeList.append(tag)
            if '<thead' in tag:
                htmlCodeList.append(tag)
            if '<tbody' in tag:
                htmlCodeList.append(tag)
            if '<td' in tag:
                htmlCodeList.append(tag)
    

    This save the HTML line in a new element of the list. so <td> would be index 0 the next set of tags would be index 1, etc.