Okay, I'm at wit's end here. For my class, we are supposed to scrape data from the wunderground.com website. We keep running into issues (error messages), OR the code will run ok, but the .txt file will contain NO data. It's pretty annoying, because I need to do this! so here is my code.
f = open('wunder-data1.txt', 'w')
for m in range(1, 13):
for d in range(1, 32):
if (m == 2 and d > 28):
break
elif (m in [4, 6, 9, 11] and d > 30):
break
url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
dayTemp = soup.find("span", text="Mean Temperature").parent.find_next_sibling("td").get_text(strip=True)
if len(str(m)) < 2:
mStamp = '0' + str(m)
else:
mStamp = str(m)
if len(str(d)) < 2:
dStamp = '0' +str(d)
else:
dStamp = str(d)
timestamp = '2009' + mStamp +dStamp
f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')
f.close()
Also sorry, this code is probably not the correct indentations as it is in Python. I'm not any good at this.
UPDATE: So someone answered the question below, and it worked, but I realized I was pulling the wrong data (oops). So I put in this:
import codecs
import urllib2
from bs4 import BeautifulSoup
f = codecs.open('wunder-data2.txt', 'w', 'utf-8')
for m in range(1, 13):
for d in range(1, 32):
if (m == 2 and d > 28):
break
elif (m in [4, 6, 9, 11] and d > 30):
break
url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
dayTemp = soup.findAll(attrs={"class":"wx-value"})[5].span.string
if len(str(m)) < 2:
mStamp = '0' + str(m)
else:
mStamp = str(m)
if len(str(d)) < 2:
dStamp = '0' +str(d)
else:
dStamp = str(d)
timestamp = '2009' + mStamp +dStamp
f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')
f.close()
So I'm pretty unsure. What I'm trying to do is data scrape the
I encountered the following errors (and fixed them below) when trying to execute your code:
codecs
module to open the file f
as "utf-8".Now as far as I can tell (without you telling us what you actually want this code to do), it's working? At least no errors are immediately popping up...
import codecs
import urllib2
from bs4 import BeautifulSoup
f = codecs.open('wunder-data1.txt', 'w', 'utf-8')
for m in range(1, 13):
for d in range(1, 32):
if (m == 2 and d > 28):
break
elif (m in [4, 6, 9, 11] and d > 30):
break
url = "http://www.wunderground.com/history/airport/KBUF/2009/" + str(m) + "/" + str(d) + "/DailyHistory.html"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
dayTemp = soup.find("span", text="Mean Temperature").parent.find_next_sibling("td").get_text(strip=True)
if len(str(m)) < 2:
mStamp = '0' + str(m)
else:
mStamp = str(m)
if len(str(d)) < 2:
dStamp = '0' +str(d)
else:
dStamp = str(d)
timestamp = '2009' + mStamp +dStamp
f.write(timestamp.encode('utf-8') + ',' + dayTemp + '\n')
f.close()
As the comments on your question have suggested, there are other areas for improvement here which I have not touched on - I've simply tried to get the code you posted executing.