I'm trying to find if steals for a given player are Poisson distributed for a single game. The idea was to create buckets for each decimal value of steals per game -- i.e. for all players who averaged 1.2 steals per game for a season over the past 10 or so years, what is the distribution of their single-game steal numbers in those seasons? How many total games with 0 steals, how many with 1 steal, etc. I was going to look at the variance and histogram of this data to see if it resembled a Poisson distribution with lambda = 1.2.
After several hours sifting through the nba_api documentation and then resorting to chatgpt, I've produced the following monstrosity of code.
The code doesn't work. It just runs forever, and then gives me some disconnection/runtime error or the following error:
ValueError: No objects to concatenate.
I tried the simplified instance below, where I just tried to create a list of players with 1.2 steals per game and then find the total number of games with 0 steals in those respective seasons. filtered_seasons should be a dataframe of qualifying seasons.
from nba_api.stats.static import players
from nba_api.stats.endpoints import playercareerstats, playergamelog
import pandas as pd
import datetime
# Get a list of all NBA players
nba_players = players.get_players()
# Initialize a list to store player seasons
player_seasons = []
# Initialize a list to store game logs for players with 1.2 steals per game
player_game_logs = []
# Iterate through the list of players
for player in nba_players:
player_id = player['id']
# Retrieve player career statistics
career_stats = playercareerstats.PlayerCareerStats(player_id=player_id)
# Get the DataFrame of player career stats
career_stats_df = career_stats.get_data_frames()[0]
# Filter for seasons with exactly 1.2 steals per game
filtered_seasons = career_stats_df[career_stats_df['STL'] == 1.2]
if not filtered_seasons.empty:
player_seasons.append(filtered_seasons)
# Iterate through the filtered seasons and fetch game logs
for season in filtered_seasons['SEASON_ID']:
game_log = playergamelog.PlayerGameLog(player_id=player_id, season=season)
game_log_df = game_log.get_data_frames()[0]
player_game_logs.append(game_log_df)
# Concatenate the filtered DataFrames
result_seasons_df = pd.concat(player_seasons, ignore_index=True)
result_game_logs_df = pd.concat(player_game_logs, ignore_index=True)
# Find games where players recorded 0 steals
zero_steals_games = result_game_logs_df[result_game_logs_df['STL'] == 0]
# Count the total number of games with 0 steals
total_zero_steals_games = len(zero_steals_games)
# Print the result
print(f"Total games with 0 steals: {total_zero_steals_games}")
The reason why your code loops "infinitely" is probably because you are going through every single player that has ever played in the NBA (or at least is retrievable trough the API). Making a large number of API calls in a short period of time can lead to rate limiting, which results in slow response times, timeouts, or even temporary suspension of access.
Here is a code I wrote that outputs a plot of the top 10 players by PPG for the 22-23 season (again, it has to go through all players of the database). It will also print a string for each player containing their name so you can keep track of the process (and witness how slow it can be).
from nba_api.stats.static import players
from nba_api.stats.endpoints import playercareerstats, playergamelog
import pandas as pd
import matplotlib.pyplot as plt
# Define the desired season
desired_season = "2022-23" # Replace with the desired season
# Get a list of all NBA players
nba_players = players.get_players()
# Initialize a list to store player data
player_data = []
# Loop through the list of players
for player in nba_players:
player_name = player['full_name']
player_id = player['id']
# Retrieve player career statistics
career_stats = playercareerstats.PlayerCareerStats(player_id=player_id)
# Get the DataFrame of player career stats
career_stats_df = career_stats.get_data_frames()[0]
# Filter for the desired season
filtered_season = career_stats_df[career_stats_df['SEASON_ID'] == desired_season]
if not filtered_season.empty:
print(f"Processing {player_name}...")
# Get the player's game log for the desired season
game_log = playergamelog.PlayerGameLog(player_id=player_id, season=desired_season)
game_log_df = game_log.get_data_frames()[0]
# Calculate the average points per game (PPG) for this player in the season
avg_ppg = game_log_df['PTS'].mean()
player_data.append({
'Player Name': player_name,
'PPG': avg_ppg
})
# Create a DataFrame from the collected player data
player_data_df = pd.DataFrame(player_data)
# Sort the players by PPG in descending order and select the top 10
top_10_players_df = player_data_df.sort_values(by='PPG', ascending=False).head(10)
# Plot a bar chart of the top 10 players by PPG
plt.barh(top_10_players_df['Player Name'], top_10_players_df['PPG'])
plt.xlabel('Points Per Game (PPG)')
plt.ylabel('Player Name')
plt.title(f'Top 10 Players by PPG in {desired_season}')
plt.gca().invert_yaxis() # Invert the y-axis to show the top player at the top
plt.show()
As for your code specifically, it works, but takes a lot of time since you're making a ton of API calls. The only problem in your code is that none of the players had averaged 1.2 steals per game, meaning the concatenation error you're receiving is because there is nothing to concatenate at the end. Here is an updated version of your code that prevents the error from happening and also contains some print lines for tracing purposes:
from nba_api.stats.static import players
from nba_api.stats.endpoints import playercareerstats, playergamelog
import pandas as pd
import datetime
# Get a list of all NBA players
nba_players = players.get_players()
# Initialize a list to store player seasons
player_seasons = []
# Initialize a list to store game logs for players with 1.2 steals per game
player_game_logs = []
# Iterate through the list of players
for player in nba_players:
player_id = player['id']
player_name = player['full_name']
print(f"Processing {player_name}...") # Print the player name
# Retrieve player career statistics
career_stats = playercareerstats.PlayerCareerStats(player_id=player_id)
# Get the DataFrame of player career stats
career_stats_df = career_stats.get_data_frames()[0]
# Set the steals per game
SPG = 1
# Filter seasons with the desired steals per game
filtered_seasons = career_stats_df[career_stats_df['STL'] == SPG]
if not filtered_seasons.empty:
player_seasons.append(filtered_seasons)
print(f"{player_name} has {len(filtered_seasons)} season(s) with {SPG} steal(s) per game.")
# Iterate through the filtered seasons and fetch game logs
for season in filtered_seasons['SEASON_ID']:
game_log = playergamelog.PlayerGameLog(player_id=player_id, season=season)
game_log_df = game_log.get_data_frames()[0]
player_game_logs.append(game_log_df)
# Check if there is data to concatenate
if player_seasons:
# Concatenate the filtered DataFrames
result_seasons_df = pd.concat(player_seasons, ignore_index=True)
result_game_logs_df = pd.concat(player_game_logs, ignore_index=True)
# Find games where players recorded 0 steals
zero_steals_games = result_game_logs_df[result_game_logs_df['STL'] == 0]
# Count the total number of games with 0 steals
total_zero_steals_games = len(zero_steals_games)
# Print the result
print(f"Total games with 0 steals: {total_zero_steals_games}")
else:
print("No data to concatenate. No players meet the criteria.")
I hope this answers your question.