Search code examples
python-3.xpandasdataframegroup-by

Delete rows from bottom, from every group/id of a dataframe


I have a dataset as such:

#Load the required libraries
import pandas as pd

#Create dataset
data = {'id': [1, 1, 1, 1, 1,1, 1, 1, 1, 1, 1,
               2, 2,2,2,2,
               3, 3, 3, 3, 3, 3,
               4, 4,4,4,4,4,4,4,
               5, 5, 5, 5, 5,5, 5, 5,5],
        'cycle': [1,2, 3, 4, 5,6,7,8,9,10,11,
                  1,2, 3,4,5,
                  1,2, 3, 4, 5,6,
                  1,2,3,4,5,6,7,8,
                  1,2, 3, 4, 5,6,7,8,9,],
        'Salary': [7, 7, 7,8,9,10,11,12,13,14,15,
                   4, 5,6,7,8,
                   8,9,10,11,12,13,
                   8,1,2,3,4,5,6,7,
                   7, 7,9,10,11,12,13,14,15,],
        'Children': ['No', 'Yes', 'Yes', 'Yes', 'Yes', 'No','No', 'Yes', 'Yes', 'Yes', 'No',
                     'Yes', 'No', 'Yes', 'No', 'Yes',
                     'No','Yes', 'Yes', 'No','No', 'Yes',
                     'Yes','Yes', 'Yes', 'No','No', 'Yes', 'Yes', 'Yes',
                      'No',  'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No',],
        'Days': [123, 128, 66, 66, 120, 141, 52,96, 120, 141, 52,
                 96, 120,128, 66, 120,
                 15,123, 128, 66, 120, 141,
                 141,128, 66, 123, 128, 66, 120,141, 
                 123, 128, 66, 123, 128, 66, 120, 141, 52,],
        }

#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)

The dataframe looks as such:

enter image description here

Here, every id has different cycles as per the 'cycle' column. For example,

id-1 has maximum 11 cycles.

id-2 has maximum 5 cycles.

id-3 has maximum 6 cycles.

id-4 has maximum 8 cycles.

id-5 has maximum 9 cycles.

Now, for every id, I wish to delete rows from the bottom.

For example,

For id-1, delete last four rows.

For id-2, delete last two rows.

For id-3, delete last three rows.

For id-4, delete last five rows.

For id-5, delete last six rows.

The dataframe then looks as such:

enter image description here

Can somebody please let me know how do I achieve this task in Python?


Solution

  • Create dictionary for specify number of deleted rows and compare by counter from back by GroupBy.cumcount with ascending=False mapped id column by Series.map, so possible filter by boolean indexing:

    d = {1:4,2:2,3:3,4:5,5:6}
    
    df = df[df.groupby('id').cumcount(ascending=False).ge(df['id'].map(d))]
    print (df)
        id  cycle  Salary Children  Days
    0    1      1       7       No   123
    1    1      2       7      Yes   128
    2    1      3       7      Yes    66
    3    1      4       8      Yes    66
    4    1      5       9      Yes   120
    5    1      6      10       No   141
    6    1      7      11       No    52
    11   2      1       4      Yes    96
    12   2      2       5       No   120
    13   2      3       6      Yes   128
    16   3      1       8       No    15
    17   3      2       9      Yes   123
    18   3      3      10      Yes   128
    22   4      1       8      Yes   141
    23   4      2       1      Yes   128
    24   4      3       2      Yes    66
    30   5      1       7       No   123
    31   5      2       7      Yes   128
    32   5      3       9       No    66