Search code examples
pythonpandasdataframematplotlibanalytics

Problem code football analytics [Python] Noob


I have been learning code (football data analytics) in order to use this skill in football analytics.

I have asked before and all the other issues have been solved but I don´t know how the lambda funtion works in the code linked bellow.

https://stackoverflow.com/a/62039153/13621874

The issue is with this lambda function. I have tried and it´s not working, and I don´t know how to solve it. Without this, the filters don´t work.

Please can someone help me:

## pass_comp: completed pass
## pass_no: unsuccessful pass

## iterating through the pass dataframe
for row_num, passed in pass_df.iterrows():   

    if passed['player_name'] == player_name:
        ## for away side
        x_loc = passed['location'][0]
        y_loc = passed['location'][1]

        pass_id = passed['id']
        summed_result = sum(breceipt_df.iloc[:, 14].apply(lambda x: pass_id in x))

        if summed_result > 0:
            ## if pass made was successful
            color = 'blue'
            label = 'Successful'
            pass_comp += 1
        else:
            ## if pass made was unsuccessful
            color = 'red'
            label = 'Unsuccessful'
            pass_no += 1

        ## plotting circle at the player's position
        shot_circle = plt.Circle((pitch_length_X - x_loc, y_loc), radius=2, color=color, label=label)
        shot_circle.set_alpha(alpha=0.2)
        ax.add_patch(shot_circle)

        ## parameters for making the arrow
        pass_x = 120 - passed['pass_end_location'][0]
        pass_y = passed['pass_end_location'][1] 
        dx = ((pitch_length_X - x_loc) - pass_x)
        dy = y_loc - pass_y

        ## making an arrow to display the pass
        pass_arrow = plt.Arrow(pitch_length_X - x_loc, y_loc, -dx, -dy, width=1, color=color)

        ## adding arrow to the plot
        ax.add_patch(pass_arrow)

enter image description here

Thanks in advance for your help!


Solution

  • The lambda function is checking to see if the pass_id for Messi is also located in the breceipt_df column 'related_events'. If it is it will return atleast 1 True row. So the sum of True will be greater than 0, which is indicating it was a successful pass. If there are no True , then the sum will not be greater than 0, thus it'll record it as an unsuccessful pass.

    So it's just checking if both IDs are present. I changed it slightly to instead using a lambda function, to just simply check to see if pass_id is in the list of related_events column. The column has nested lists, so that will need to be flattened (which I do in the code)

    So try putting this in it's place:

    ## pass_comp: completed pass
    ## pass_no: unsuccessful pass
    
    ## iterating through the pass dataframe
    for row_num, passed in pass_df.iterrows():   
    
        if passed['player_name'] == player_name:
            ## for away side
            x_loc = passed['location'][0]
            y_loc = passed['location'][1]
    
            pass_id = passed['id']
           
            ######### ALTERED CODE ###################
            events_list = [item for sublist in breceipt_df['related_events'] for item in sublist]
            if pass_id in events_list:
                ## if pass made was successful
                color = 'blue'
                label = 'Successful'
                pass_comp += 1
            else:
                ## if pass made was unsuccessful
                color = 'red'
                label = 'Unsuccessful'
                pass_no += 1
           ########################################    
    
    
            ## plotting circle at the player's position
            shot_circle = plt.Circle((pitch_length_X - x_loc, y_loc), radius=2, color=color, label=label)
            shot_circle.set_alpha(alpha=0.2)
            ax.add_patch(shot_circle)
    
            ## parameters for making the arrow
            pass_x = 120 - passed['pass_end_location'][0]
            pass_y = passed['pass_end_location'][1] 
            dx = ((pitch_length_X - x_loc) - pass_x)
            dy = y_loc - pass_y
    
            ## making an arrow to display the pass
            pass_arrow = plt.Arrow(pitch_length_X - x_loc, y_loc, -dx, -dy, width=1, color=color)
    
            ## adding arrow to the plot
            ax.add_patch(pass_arrow)
    

    Full Code

    import matplotlib.pyplot as plt
    import json
    from pandas.io.json import json_normalize
    from FCPython import createPitch
    
    ## Note Statsbomb data uses yards for their pitch dimensions
    pitch_length_X = 120
    pitch_width_Y = 80
    
    ## match id for our El Clasico
    #match_list = [16205, 16131, 16265]
    match_list = ['16157']
    teamA = 'Barcelona'  #<--- adjusted here
    
    for match_id in match_list:
        ## calling the function to create a pitch map
        ## yards is the unit for measurement and
        ## gray will be the line color of the pitch map
        (fig,ax) = createPitch(pitch_length_X, pitch_width_Y,'yards','gray') #< moved into for loop
    
        player_name = 'Lionel Andrés Messi Cuccittini'
    
        ## this is the name of our event data file for
        ## our required El Clasico
        file_name = str(match_id) + '.json'
    
        ## loading the required event data file
        ## Adjust path to your events folder
        my_data = json.load(open('Statsbomb/open-data-master/data/events/' + file_name, 'r', encoding='utf-8'))
    
    
        ## get the nested structure into a dataframe 
        ## store the dataframe in a dictionary with the match id as key
        df = json_normalize(my_data, sep='_').assign(match_id = file_name[:-5])
        teamB = [x for x in list(df['team_name'].unique()) if x != teamA ][0] #<--- get other team name
    
        ## making the list of all column names
        column = list(df.columns)
    
        ## all the type names we have in our dataframe
        all_type_name = list(df['type_name'].unique())
    
        ## creating a data frame for pass
        ## and then removing the null values
        ## only listing the player_name in the dataframe
        pass_df = df.loc[df['type_name'] == 'Pass', :].copy()
        pass_df.dropna(inplace=True, axis=1)
        pass_df = pass_df.loc[pass_df['player_name'] == player_name, :]
    
        ## creating a data frame for ball receipt
        ## removing all the null values
        ## and only listing Barcelona players in the dataframe
        breceipt_df = df.loc[df['type_name'] == 'Ball Receipt*', :].copy()
        breceipt_df.dropna(inplace=True, axis=1)
        breceipt_df = breceipt_df.loc[breceipt_df['team_name'] == 'Barcelona', :]
    
        pass_comp, pass_no = 0, 0
        ## pass_comp: completed pass
        ## pass_no: unsuccessful pass
        
        ## iterating through the pass dataframe
        for row_num, passed in pass_df.iterrows():   
        
            if passed['player_name'] == player_name:
                ## for away side
                x_loc = passed['location'][0]
                y_loc = passed['location'][1]
        
                pass_id = passed['id']
               
                events_list = [item for sublist in breceipt_df['related_events'] for item in sublist]
                if pass_id in events_list:
                    ## if pass made was successful
                    color = 'blue'
                    label = 'Successful'
                    pass_comp += 1
                else:
                    ## if pass made was unsuccessful
                    color = 'red'
                    label = 'Unsuccessful'
                    pass_no += 1
        
                ## plotting circle at the player's position
                shot_circle = plt.Circle((pitch_length_X - x_loc, y_loc), radius=2, color=color, label=label)
                shot_circle.set_alpha(alpha=0.2)
                ax.add_patch(shot_circle)
        
                ## parameters for making the arrow
                pass_x = 120 - passed['pass_end_location'][0]
                pass_y = passed['pass_end_location'][1] 
                dx = ((pitch_length_X - x_loc) - pass_x)
                dy = y_loc - pass_y
        
                ## making an arrow to display the pass
                pass_arrow = plt.Arrow(pitch_length_X - x_loc, y_loc, -dx, -dy, width=1, color=color)
        
                ## adding arrow to the plot
                ax.add_patch(pass_arrow)
    
        ## computing pass accuracy
        pass_acc = (pass_comp / (pass_comp + pass_no)) * 100
        pass_acc = str(round(pass_acc, 2))
    
        ## adding text to the plot
        plt.suptitle('{} pass map vs {}'.format(player_name, teamB), fontsize=15) #<-- make dynamic and change to suptitle
        plt.title('Pass Accuracy: {}'.format(pass_acc), fontsize=15) #<-- change to title
    
        ## handling labels
        handles, labels = plt.gca().get_legend_handles_labels()
        by_label = dict(zip(labels, handles))
        plt.legend(by_label.values(), by_label.keys(), loc='best', bbox_to_anchor=(0.9, 1, 0, 0), fontsize=12)
    
        ## editing the figure size and saving it
        fig.set_size_inches(12, 8)
        fig.savefig('{} passmap.png'.format(match_id), dpi=200)  #<-- dynamic file name
    
        ## showing the plot
        plt.show()
    

    enter image description here