Search code examples
pythonweb-scrapinghref

TypeError shows on my output terminal "NoneType object is not subscriptable"


I am having issue extracting 'href' and here the html code:

<a href="https://www.akinsfoodltd.co.uk?utm_source=yell&amp;utm_medium=referral&amp;utm_campaign=yell" data-tracking="WL:CLOSED" class="btn btn-yellow businessCapsule--ctaItem" target="_blank" rel="nofollow noopener">
<div class="icon icon-Business-website" title="Visit Akin's Food Ltd's Website"></div> Website</a>

Here is my code:

from bs4 import BeautifulSoup
import requests
import csv
    
url ='https://www.yell.com/ucs/UcsSearchAction.do?keywords=Food&location=United+Kingdom&scrambleSeed=1316051868'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqNJauL-25'}
response = requests.get(url, headers=header)
product = soup.find_all('div', 'row businessCapsule--mainRow')
#print(product)
    
for x in product:
      name = x.find('h2', {'itemprop': 'name'}).text
      address = x.find('span', {'itemprop': 'streetAddress'}).text
      post_code = x.find('span', {'itemprop': 'postalCode'}).text
      telp = x.find('span', 'business--telephoneNumber').text
      web = x.find('a', {'rel': 'nofollow noopener'})["href"]
      print(web)

On Output terminal it shows:

TypeError: 'NoneType' object is not subscriptable

Solution

  • You are trying to scrape hrefs out of some container in which they don't exist which is why you encountered such error. The following is one of the few ways how you can handle that error:

    import requests
    from bs4 import BeautifulSoup
    
    url = 'https://www.yell.com/ucs/UcsSearchAction.do?keywords=Food&location=United+Kingdom&scrambleSeed=1316051868'
    
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36/8mqNJauL-25'}
    
    res = requests.get(url,headers=headers)
    soup  = BeautifulSoup(res.text,"html.parser")
    for item in soup.find_all(class_='businessCapsule--mainRow'):
        name = item.find('h2',class_='businessCapsule--name').text
        phone = item.find(class_='business--telephoneNumber').text
        try:
            website = item.find('a',{'data-tracking':'WL:CLOSED'}).get("href")
        except (TypeError,AttributeError): website = ""
        print(name,phone,website)