Search code examples
pythondebuggingexceptionpycharmtry-except

Try/except not working as expected: "Except" error message is appended to passing result


I have code that is meant to find a graph on a webpage and create a link for web-crawling from it. If a graph is not found, then I've put in a try/except to print a message with a corresponding (player) link so it goes on to the next one if not found.

It's from a football valuation website and I've reduced the list two players for debugging: one is Kylian Mbappé (who has a graph on his page and should pass) and the other Ansu Fati (who doesn't). Attempting to grab the Ansu Fati's graph tag from his profile using BeautifulSoup results in a NoneType error.

The issue here is that Mbappé's graph link does get picked up for processing downstream in the code, but the "except" error/link message in the except clause is also printed to the console. This should only be the case for Ansu Fati.

Here's the code

final_url_list = ['https://www.transfermarkt.us/kylian-mbappe/profil/spieler/342229','https://www.transfermarkt.com/ansu-fati/profil/spieler/466810']

for i in final_url_list:

    try:
        int_page = requests.get(i, headers = {'User-Agent':'Mozilla/5.0'}).text

    except requests.exceptions.Timeout:
        sys.exit(1)

    parsed_int_page = BeautifulSoup(int_page,'lxml')


    try:
        graph_container = parsed_int_page.find('div', class_='large-7 columns small-12 marktwertentwicklung-graph')
        graph_a = graph_container.find('a')
        graph_link = graph_a.get('href')
        final_url_list.append('https://www.transfermarkt.us' + graph_link)
    except:
        pass
        print("Graph error:" + i)

I tried using PyCharm's debugging to see how the interpreter is going through the steps and it seems like the whole except clause is skipped, but when I run it in the console, the "Graph error: link" is posted for both. I'm not sure what is wrong with the code for the try/except issue to be behaving this way.


Solution

  • The line

    except None:
    

    is looking for an exception with type None, which is impossible.

    Try changing that line to

    except AttributeError:
    

    Doing so will result in the following output:

    Graph error:https://www.transfermarkt.com/ansu-fati/profil/spieler/466810
    Graph error:https://www.transfermarkt.us/kylian-mbappe/marktwertverlauf/spieler/342229
    

    There's an additional issue here where you're modifying the list that you're iterating over, which is not only bad practice, but is resulting in the unexpected behavior you're seeing.

    Because you're appending to the list you're iterating over, you're going to add an iteration for a url that you don't actually want to be scraping. To fix this, change the first couple of lines in your script to this:

    url_list = ['https://www.transfermarkt.us/kylian-mbappe/profil/spieler/342229','https://www.transfermarkt.com/ansu-fati/profil/spieler/466810']
    final_url_list = []
    
    for i in url_list:
    

    This way, you're appending the graph links to a different list, and you won't try to scrape links that you shouldn't be scraping. This will put all of the "graph links" into final_url_list