Search code examples
pythonhtmlweb-scrapingbeautifulsoupdata-extraction

TypeError: 'NoneType' object is not callable (Python: Scraping from HTML data)


I am trying to scrape the numbers from HTML data so that I can get the sum of them. However, I am running into the above error when I try to run it. It is referring to the "data = " line. What is this error referring to in this line of code? Have I set the "for" loop up correctly? Thank you for your thoughts.

import urllib
from bs4 import BeautifulSoup

url = "http://python-data.dr-chuck.net/comments_42.html"
html = urllib.urlopen(url).read()

soup = BeautifulSoup(html, "html.parser")
tags = soup('span')
data = soup.findall("span", {"Comments":"Comments"})
numbers = [d.text for d in data]

summation = 0
for tag in tags:
    print tags
    y= tag.finall("span").text      
    summation = summation + int(y)                  
print summation

This is what the HTML data looks like:

<tr><td>Modu</td><td><span class="comments">90</span></td></tr>
<tr><td>Kenzie</td><td><span class="comments">88</span></td></tr>
<tr><td>Hubert</td><td><span class="comments">87</span></td></tr>

Solution

  • First of all, there is no findall() method in BeautifulSoup - there is find_all(). Also, you are basically searching for elements having Comments attribute that has a Comments value:

    soup.findall("span", {"Comments":"Comments"})  
    

    And, this is Python, you can sum up much easier with a built-in sum().

    Fixed version:

    data = soup.find_all("span", {"class": "comments"})
    print sum(int(d.text) for d in data)  # prints 2482