python web-scraping beautifulsoup python-requests

Why Beautiful Soup find_all does not find all matching elements in page?

What am I trying to achieve?

I am trying to scrape the "Player Shooting" table from this webpage. More specifically I want to return the tr tags from the stats_shooting table as a list (with one tr per element of the list).

What have I done so far?

I return the web page using the block below:

# Request page
all_players_shooting_url = "https://fbref.com/en/comps/9/shooting/Premier-League-Stats"
html = requests.get(all_players_shooting_url)
assert html.status_code == 200, f"Status code of {html.status_code} was returned."
soup = bs(html, 'html.parser')

Where have I encountered problems / and what I have done to resolve them

I have then tried a number of approaches to get to the data that I need:

Simple find all method - this gives me the outer information but I cant search it further to get the tr's

granular_search = soup.find_all("div", {"id": "all_stats_shooting"})
print(f"Granular search returns {len(granular_search)} results. Expected 1.")

Brute force return of all table tags from the page. This doesn't return the table I care about...

broad_search = soup.find_all("table", recursive=True)
print(f"Broad search returns {len(broad_search)} results. Expected 3.")

Some joy returning the table using the CSS Selector (I actually get something back...) but not able to search it further to get the tr's...

css_search = soup.select("#all_stats_shooting")
print(f"CSS search returns {len(css_search)} results. Expected 1.")
further_search = css_search[0].find_all("tr")
print(f"Further search returns {len(further_search)} results. Expected > 0.")

I can attempt to return all elements with a tr tag, but again it only returns the first two tables...

tr_search = soup.find_all('tr')
print(f"Tr search returns {len(tr_search)} results. Expected > 44")

Please note: I have also developed a solution using Selenium. It works but it's slow and unstable. With this in mind, some of the existing answers e.g. this one don't really solve my problem.

Solution

Main issue here is, that the table you try to find is stored in comments, so you have to comment it out first:

soup = bs(html.text.replace('<!--','').replace('-->',''), 'html.parser')

Then to select only the data rows adjust your css selector:

soup.select("#all_stats_shooting table tr:has(td)")

To scrape the table and store it directly to dataframe use pandas - check and adapt following question How to extract hidden table from fbref website by id?