I am trying to scrape the numbers from HTML data so that I can get the sum of them. However, I am running into the above error when I try to run it. It is referring to the "data = " line. What is this error referring to in this line of code? Have I set the "for" loop up correctly? Thank you for your thoughts.
import urllib
from bs4 import BeautifulSoup
url = "http://python-data.dr-chuck.net/comments_42.html"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
tags = soup('span')
data = soup.findall("span", {"Comments":"Comments"})
numbers = [d.text for d in data]
summation = 0
for tag in tags:
print tags
y= tag.finall("span").text
summation = summation + int(y)
print summation
This is what the HTML data looks like:
<tr><td>Modu</td><td><span class="comments">90</span></td></tr>
<tr><td>Kenzie</td><td><span class="comments">88</span></td></tr>
<tr><td>Hubert</td><td><span class="comments">87</span></td></tr>
First of all, there is no findall()
method in BeautifulSoup
- there is find_all()
. Also, you are basically searching for elements having Comments
attribute that has a Comments
value:
soup.findall("span", {"Comments":"Comments"})
And, this is Python, you can sum up much easier with a built-in sum()
.
Fixed version:
data = soup.find_all("span", {"class": "comments"})
print sum(int(d.text) for d in data) # prints 2482