Search code examples
pythonpandasdataframepandas-groupby

Extracting specific number of rows from dataframe


I have a csv file having two columns i.e. imagename and ID. There are multiple image names for same ID as shown in picture. Number of image names against id is different. I need to extract same number of image names against ids and drop the exceeded rows . For example if id 5 has the lowest number of images say 8 then all ids would have corresponding 8 image names.

Table format

This code is extracting first 100 id's but number of images against each ID is different i.e. ID =1 has 11 images but Id=2 has 24 images and so on

`select_id = df.loc[df['id'] <= 100]`

Expected output equal number of images against each ID Expected output


Solution

  • You can do this:

    # get min number of occurrences
    min_len = df['id'].value_counts().min()
    
    # group by and get head for each group
    df.groupby('id').head(min_len)