Search code examples
pythonpandasdataframedrop

How to delete 50% of rows that share a certain column value


df.groupby(['target']).count()
Target data
Negative 103210
Positive 211082

Right now, my positive data is too large. I want to delete 50% of rows whose value in the Target column is Positive. How can I do it?


Solution

  • To keep half of the Positive rows, sample 50% of the Positive rows using frac=0.5 and drop those indexes:

    indexes = df[df.target == 'Positive'].sample(frac=0.5).index
    df = df.drop(indexes)
    

    To keep exactly 100K Positive rows, sample 100K Positive rows using n=100_000 and concat them with the Negative rows:

    df = pd.concat([
        df[df.target == 'Negative'],
        df[df.target == 'Positive'].sample(n=100_000)
    ])