Hi I was wondering how I can use beautifulsoup to scrape bank of america for its hours. For example, if the url is (Shattuck_Ave_94704_BERKELEY_CA/bank_branch_locations/">http://locators.bankofamerica.com/locator/locator/2129_Shattuck_Ave_94704_BERKELEY_CA/bank_branch_locations/) how can i extract hours only? Below is my initial attempt at it, but it seems to return nothing.
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
hours = soup.find_all("div", class_="lobbyHours")
print hours
That url redirects, which is why soup.find_all("div", class_="lobbyHours")
returns nothing. There is no div
with that class on the page you're redirecting to.
By monitoring network traffic with Firefox's Firebug, I found that the url you are requesting actually returns a 301 Moved Permanently
status code. Fortunately, even a 301 status code, in the response headers provides a Location
header. In this case:
'http://locators.bankofamerica.com/locator/locator/LocatorAction.do?shouldTest=true'
Which is the branch-locator page. You will have to start at this page, programmatically 'search' for the location(s) you would like, find the appropriate link, and perform a third request.
The site also uses cookies, so look into cookielib.