Search code examples
python-3.xweb-scrapingbeautifulsoupcss-selectorsdata-extraction

How to select elements without a class using beautifulsoup


scraping the Fbref website to get specific player info so that I can use that for further analysis. I have selected the table I want to scrape. The information I want is in <tr> tags without any class attributes. But the issue is that this table has many headers in <tr> tags that have a class name

import requests
from bs4 import BeautifulSoup
from time import sleep
url = "https://fbref.com/en/comps/9/2021-2022/stats/2021-2022-Premier-League-Stats"

response = requests.get(url).text.replace('<!--', '').replace('-->', '')

soup = BeautifulSoup(response, "html.parser")

I have selected the desired table I want to scrape. I want to select <tr> tags that don't have any class attribute because that's where the information I want is located.

players_table = soup.select("table#stats_standard tbody tr", class_ =None)

I have then looped through the players_table so that I can get each player's info like name, country, position, etc.

for player in players_table:
     player_name = player.find("td", attrs={"data-stat" : "player"}).a.text   
    print(player_name)
    sleep(2)

But now the problem is that my code will loop through the table and when it finds the <tr class="theads"> tag, it tries to look for its <a> tag and then further look for the text in the <a> tag. But this specific <tr class="theads"> tag doesn't have any <a> tags and that makes my code to break and get this error message 'NoneType' object has no attribute 'a' when I try to run it.

My code prints the names of the players untill it finds this <tr class="theads"> tag with no <a> then it just fails & breaks. I have even tried to decompose or clear this <tr class="theads"> tag, but it still doesn't work.

player.find(".thead").decompose()

So my question is how can I select only tags that don't have any class so that when my reaches tag, it just neglects it. I have actually tried doing that by using class_ = None when making the table

players_table = soup.select("table#stats_standard tbody tr", class_ =None)

But this didn't solve anything. I need your help on this, please.


Solution

  • If you only wanna exclude the subheaders adjust your selector, that it only selects these <tr> without class .thead:

    soup.select('table#stats_standard tbody tr:not(.thead)')
    

    or more specific to the title of your question that do not have a class attribute:

    soup.select('table#stats_standard tbody tr:not([class])')
    

    Example

    import requests
    from bs4 import BeautifulSoup
    from time import sleep
    url = "https://fbref.com/en/comps/9/2021-2022/stats/2021-2022-Premier-League-Stats"
    
    response = requests.get(url).text.replace('<!--', '').replace('-->', '')
    
    soup = BeautifulSoup(response)
    
    for player in soup.select('table#stats_standard tbody tr:not([class])'):
        player_name = player.find("td", attrs={"data-stat" : "player"}).a.text   
        print(player_name)