I have a csv file having two columns i.e. imagename and ID. There are multiple image names for same ID as shown in picture. Number of image names against id is different. I need to extract same number of image names against ids and drop the exceeded rows . For example if id 5 has the lowest number of images say 8 then all ids would have corresponding 8 image names.
This code is extracting first 100 id's but number of images against each ID is different i.e. ID =1 has 11 images but Id=2 has 24 images and so on
`select_id = df.loc[df['id'] <= 100]`
Expected output equal number of images against each ID Expected output
You can do this:
# get min number of occurrences
min_len = df['id'].value_counts().min()
# group by and get head for each group
df.groupby('id').head(min_len)