Here is a dummy example of the DF I'm working with. It effectively comprises binned data, where the first column gives a category and the second column the number of individuals in that category.
df = pd.DataFrame(data={'Category':['A','B','C','D','E','F','G','H','I'],
'Count':[1000,200,850,350,4000,20,35,4585,2],})
I want to take a random sample, say of 100 individuals, from these data. So for example my random sample could be:
sample1 = pd.DataFrame(data={'Category':['A','B','C','D','E','F','G','H','I'],
'Count':[15,2,4,4,35,0,15,25,0],})
I.e. the sample cannot contain more individuals than are actually in any of the categories. Sampling 0 individuals from a category is possible (and more likely for categories with a lower Count).
How could I go about doing this? I feel like there must be a simple answer but I can't think of it!
Thank you in advance!
You can try sample with replacement:
df.sample(n=100, replace=True, weights=df.Count).groupby(by='Category').count()