I am creating a df with Pandas that has several hundred rows while web scraping a sports website. I am attempting to parse through the rows and drop rows based on the value of a certain column. I've tried looking through W3 and other sites to find the correct method but nothing I've found really seems to match my need. I have my code listed below. Does anyone know of a good way to accomplish this?
import pandas as pd
def rec_career():
url = 'https://www.pro-football-reference.com/years/2022/receiving.htm'
base_url = 'https://www.pro-football-reference.com'
#Establish Dictionary
player_links = dict()
# Use Pandas to read table
table = pd.read_html(url, attrs={'id': 'receiving'})[0]
table.head()
table.index = range(len(table))
for i, row in table.iterrows():
if row[4] != 'WR' or 'TE':
table = table.drop(index=i)
print(table)
rec_career()
The above code returns an empty database so its obviously just parsing through and deleting all the rows but I am unsure why it is doing that. Im basically trying to drop players from the df that aren't receivers.
Avoid using for
loop in pandas, as pandas has faster and more concise methods:
...
table = pd.read_html(url, attrs={'id': 'receiving'})[0]
table.head()
table.index = range(len(table))
table = table[table.Pos.isin(['WR', 'TE'])]
print(table)