Search code examples
pythonbeautifulsouperror-handlingtry-except

Why second exception is not working in python?


import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
}

for x in range(60,61):
    url = 'https://example.com/page/'
    r = requests.get(url+str(x), headers = headers)
    soup = BeautifulSoup(r.content, features='lxml')

    articles = soup.find_all('article', class_='blog-view')
    
    for item in articles:
        title = item.find('h2', class_="entry-title").text
        
        if title == "Premium" or title == "Deleted" or title == "deleted":
            image_url = "None"
        else:
            try:
                image_url = item.find('div', class_='entry-content').p.img['src']
            except TypeError:
                image_url = item.find('div', class_='wp-caption').img['src']
            except AttributeError:
                image_url = "None"
            print(image_url)

Output

TypeError
Cell In [10], line 30
     29 try:
---> 30     image_url = item.find('div', class_='entry-content').p.img['src']
     31 except TypeError:

TypeError: 'NoneType' object is not subscriptable

During handling of the above exception, another exception occurred:

AttributeError
Cell In [10], line 32
     30     image_url = item.find('div', class_='entry-content').p.img['src']
     31 except TypeError:
---> 32     image_url = item.find('div', class_='wp-caption').img['src']
     33 except AttributeError:
     34     image_url = "None"

AttributeError: 'NoneType' object has no attribute 'img

I am a newbie, I have given 2 exceptions one for TypeError and another for AttributeError so at the end I should get "None" in the output.

But somehow the 2nd exception is not executing. In python, we can give as many exceptions as we can but in this case, 2nd exception is not executing. why? Is this because of for loop or indentation?


Solution

  • Your second except should be nested in the first except - currently it is not, hence you get that error. Try this:

    import requests
    from bs4 import BeautifulSoup
    
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36'
    }
    
    for x in range(60,61):
        url = 'https://example.com/page/'
        r = requests.get(url+str(x), headers = headers)
        soup = BeautifulSoup(r.content, features='lxml')
    
        articles = soup.find_all('article', class_='blog-view')
        
        for item in articles:
            title = item.find('h2', class_="entry-title").text
            
            if title == "Premium" or title == "Deleted" or title == "deleted":
                image_url = "None"
            else:
                try:
                    image_url = item.find('div', class_='entry-content').p.img['src']
                except TypeError:
                    try:
                        image_url = item.find('div', class_='wp-caption').img['src']
                    except AttributeError:
                        image_url = "None"
                print(image_url)
    

    If you still have issues, confirm the actual url, and your end goal (what are you after?), and I will update my answer.