Search code examples
pythonpandasdataframeanalysis

How to sum the first values of a groupby result and write this as a df column across the relevant group


I have a data frame looking like this (please see below):

Player Name         Team                    PlayerMarketValue

Steph Curry         Golden State Warriors   169027.4782
Steph Curry         Golden State Warriors   169027.4782
Steph Curry         Golden State Warriors   169027.4782
Steph Curry         Golden State Warriors   169027.4782
Lebron James        All Stars               120896.3772
Lebron James        All Stars               120896.3772
Lebron James        All Stars               120896.3772
Luka Doncic         All Stars               36789.6562
Luka Doncic         All Stars               36789.6562
Luka Doncic         All Stars               36789.6562

I am trying to find a good way to groupby both Team and then PlayerName and from this, take the first instance of PlayerMarketValue and then sum this and write this value across its respective Team.

I've hit the block after getting to:

NBAData['TotalTeamVal'] = NBAData.groupby(['Team', 'Player Name'])['PlayerMarketValue'].first().sum()

Ie: I am looking for (note the All Stars rows in particular):

enter image description here


Solution

  • Without knowing much about the structure of the data, I will assume that a player can only be part in one of the teams, thus I would start

    1. remove the duplicate rows associated with each player
    2. sum the values of players on a team
    3. reset the index and merge all back to the main DataFrame using the team columns as a key
    pd.merge(df, 
             df.drop_duplicates('Player Name').groupby(['Team'])['PlayerMarketValue'].sum().reset_index(), 
             on='Team', 
             suffixes= ['', '_Team']
             )