I'm trying to create importable calender events from a website. The website has the events clustered into a standard html table.
I was wondering if beautfulsoup is the correct way to takel this problem, because i only get the first entry and then nothing.
quote_page = "http://www.ellen-hartmann.de/babybasare.html"
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"border": "1"})
td = table.find("td", text="Veranstaltungstyp ")
print table
td_next = table.find_next("tr")
print td_next
I think you're stopping because your using find()
which gets one matching tag, instead of find_all()
which gets all the matching tags. Then you have to loop over the results
import requests
from bs4 import BeautifulSoup
response = requests.get("http://www.ellen-hartmann.de/babybasare.html")
soup = BeautifulSoup(response.text, 'html.parser')
# now let's find every row in every table
for row in soup.find_all("tr"):
# grab the cells within the row
cells = row.find_all("td")
# print the value of the cells as a list. This is the point where
# you will need to filter the rows to figure out what is an event (and
# what is not), determine the start date and time, and convert the values
# to iCal format.
print([c.text for c in cells])