I am having issue extracting 'href' and here the html code:
<a href="https://www.akinsfoodltd.co.uk?utm_source=yell&utm_medium=referral&utm_campaign=yell" data-tracking="WL:CLOSED" class="btn btn-yellow businessCapsule--ctaItem" target="_blank" rel="nofollow noopener">
<div class="icon icon-Business-website" title="Visit Akin's Food Ltd's Website"></div> Website</a>
Here is my code:
from bs4 import BeautifulSoup
import requests
import csv
url ='https://www.yell.com/ucs/UcsSearchAction.do?keywords=Food&location=United+Kingdom&scrambleSeed=1316051868'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqNJauL-25'}
response = requests.get(url, headers=header)
product = soup.find_all('div', 'row businessCapsule--mainRow')
#print(product)
for x in product:
name = x.find('h2', {'itemprop': 'name'}).text
address = x.find('span', {'itemprop': 'streetAddress'}).text
post_code = x.find('span', {'itemprop': 'postalCode'}).text
telp = x.find('span', 'business--telephoneNumber').text
web = x.find('a', {'rel': 'nofollow noopener'})["href"]
print(web)
On Output terminal it shows:
TypeError: 'NoneType' object is not subscriptable
You are trying to scrape hrefs out of some container in which they don't exist which is why you encountered such error. The following is one of the few ways how you can handle that error:
import requests
from bs4 import BeautifulSoup
url = 'https://www.yell.com/ucs/UcsSearchAction.do?keywords=Food&location=United+Kingdom&scrambleSeed=1316051868'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqNJauL-25'}
res = requests.get(url,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
for item in soup.find_all(class_='businessCapsule--mainRow'):
name = item.find('h2',class_='businessCapsule--name').text
phone = item.find(class_='business--telephoneNumber').text
try:
website = item.find('a',{'data-tracking':'WL:CLOSED'}).get("href")
except (TypeError,AttributeError): website = ""
print(name,phone,website)