Search code examples
pandasdataframewhere-clauserows

Select only a number of rows from a pandas Dataframe based on a condition


I'm want to sample n rows from each different value in column named club

enter image description here

columns = ['long_name','age','dob','height_cm','weight_kg','club']
teams = ['Real Madrid','FC Barcelona','Chelsea','CA Osasuna','Paris Saint-Germain','FC Bayern München','Atlético Madrid','Manchester City','Liverpool','Hull City']
playersDataDB = playersData.loc[playersData['club'].isin(teams)][columns]
playersDataDB.head()

In the code above i have selected my desired colums based on them belonging to the teams selected.

The output from this code is a 299 rows × 6 columns Dataframe meaning that i'm sampling all the player from the team but i want to get just 16 of them from each club.


Solution

  • Not sure how your dataframe looks like but you could groupby teams and then use head(16) to get only the first 16 of them.

    df.groupby('club').head(16)