df.groupby(['target']).count()
Target | data |
---|---|
Negative | 103210 |
Positive | 211082 |
Right now, my positive data is too large. I want to delete 50% of rows whose value in the Target
column is Positive
. How can I do it?
To keep half of the Positive
rows, sample
50% of the Positive
rows using frac=0.5
and drop
those indexes:
indexes = df[df.target == 'Positive'].sample(frac=0.5).index
df = df.drop(indexes)
To keep exactly 100K Positive
rows, sample
100K Positive
rows using n=100_000
and concat
them with the Negative
rows:
df = pd.concat([
df[df.target == 'Negative'],
df[df.target == 'Positive'].sample(n=100_000)
])