I am working on a data frame that looks like this:
Id feat1 value
c1 c22 51
c2 c12 83
c3 d31 42
c4 a19 110
c5 d44 56
. . .
. . .
. . .
The value column has a range [40,240]. I want to downsample the dataframe such that I get 300 rows for each of the following bins: [40-50,50-60,60-70,70-80,80-90,90-100,100-110....]
You can create bins using pandas.cut(), then groupby bins to draw equal samples per bin
df['bin'] = pd.cut(df['value'], range(40, 250, 10))
sampled_df = df.groupby('bin').apply(lambda x: x.sample(300)).reset_index(drop=True)