Search code examples
pandasjupyter-notebookdata-sciencedata-analysis

Select the first row of each group after 'groupby()' and 'value_counts() function


I have a data set named new_data_set which looks like this:

Image

I want to find genre which came the maximum number of times for each year.

So I did this:

new_data_set.groupby('release_year')['genre']).apply(lambda x: x.value_counts())`

And the result of it looks like this:result

Now I am in need to fetch the first row from each group to get the answer. So the result should look like this:

1960 Drama
1961 Drama
.
.

How should I do this?


Solution

  • Add index[0] and then reset_index:

    new_data_set = pd.DataFrame({
             'release_year':[2004,2005,2004,2005,2005,2004],
             'genre':list('aaabbb')
    })
    
    df = (new_data_set.groupby('release_year')['genre']
                      .apply(lambda x: x.value_counts().index[0])
                      .reset_index()
                     )
    print (df)
       release_year genre
    0          2004     a
    1          2005     b