I am very new to Programming and started teaching myself web-scraping with Python. I am scraping player data from multiple pages of a site and built a while loop which scrapes a 'next'-button's href to get to the next player's page. Everything is working out fine, except breaking the while loop after the last player available. The 'next'-button will gray out and have no link behind it, therefore I want to stop the iteration and save everything to a csv.
My script looks like this:
#name base url and first page to start
BaseUrl = #url
PageUrl = #also url
while True:
#scraping tables
try:
# retrieve link for 'next' player in order
link = soup.find(attrs={"class": "go_to_next_player"}).get('href')
# join base url and new link href
PageUrl = BaseUrl + link
if link is None:
break
except IndexError as e:
print(e)
break
#writing to csv
I thought I could check if the retrieved href is empty, therefore checking 'is None' and breaking, but I get this error:
In line > PageUrl = BaseUrl + link
TypeError: must be str, not NoneType
Help would be greatly appreciated! I am very new to this, so please disregard my beginner code.
You can check if link
is None
before doing any operations with it, and then break the loop:
if link is not None:
PageUrl = BaseUrl + link
else:
break