Search code examples
pythonpython-3.xpandasdataframegroup-by

Getting first row by ID in Python


I have a piece of code that should be getting the first_team (first value) of a column grouped by ID and setting it to a dictionary, but what I am seeing is it is only getting the first value with value. Excluding those that are NaN.

Here is a sample dataset

 ID     date           name       team       first_team
 101   05/2012         James      NaN            NY
 101   07/2012         James      NY             NY
 102   06/2013         Adams      NC             NC
 102   05/2014         Adams      AL             NC 

The code I have is:

first_dict = df.groupby('ID').agg({'team':'first'}).to_dict()['team']
df['first_team'] = df['ID'].apply(lambda x: first_dict[x])

Desired output:

  ID      date        name      team         first_team 
  101     05/2012     James      NaN           NaN 
  101     07/2012     James      NY            NaN 
  102     06/2013     Adams      NC            NC 
  102     05/2014     Adams      AL            NC 

Solution

  • If you want to keep the first entry, you can do with drop_duplicates:

    first_dict = df.drop_duplicates('ID')[['ID','team']].set_index('ID')['team']
    df['first_team'] = df['ID'].map(first_dict)
    

    Output:

        ID     date   name team first_team
    0  101  05/2012  James  NaN        NaN
    1  101  07/2012  James   NY        NaN
    2  102  06/2013  Adams   NC         NC
    3  102  05/2014  Adams   AL         NC
    

    Note: FFR, your code can be better done with transform,

    df['first_team'] = df.groupby('ID')['team'].transform('first')