I am trying to scrape the profile URLs from a Yelp search results page using Beautiful Soup. This is the code I currently have:
soup = BeautifulSoup(data,'lxml')
for a in soup.find_all('a', href=True):
with open(r'C:\Users\my.name\Desktop\Yelp-URLs.csv',"a") as f:
This gives me every href link on the page, not just profile URLs. Additionally, I am getting the full class string (a class lemon....), when I just need the business profile URL's.
Please help.
You can narrow down the href limitation by using select.
for a in soup.select('a[href^="/biz/"]'):
with open(r'/Users/my.name/Desktop/Yelp-URLs.csv',"a") as f:
print(a.attrs['href'], file=f)