Search code examples
pythonpandasdictionarypandas-groupbypython-datetime

How do i limit the count/sum of a column to 3 per month?


I have this dataset df,

ID    Name
23    Dan
24    Bob

This dataset shows relationship of each ID

ID    ID2       DATE       Status
23    10     2019-06-11     Sent
23    20     2019-06-21     Sent
23    30     2019-06-26     Sent
23    40     2019-06-27     Sent
23    50     2019-12-02     Sent
24    55     2019-06-27     Sent
24    65     2019-06-29     Sent

Here, ID 23 sent letters to ID's 10, 20, 30, 40, 50 on the mentioned DATE. I wanted to get the count of how many each ID sent letters to. I did something like this,

id = df.groupby(['ID'], as_index = False)
id_dict = {}

all_df = id.get_group(ID)

letter_count = 0
for index, row in all_df.iterrows():
        if ((row['Status'] == 'Sent')):
            letter_count = letter_count + 1

id_dict.update({ID:letter_count})
df['letter_count'] = df['ID'].map(id_dict)

I get this output on df,

ID    Name  letter_count
23    Dan        4
24    Bob        2

DATE is not taken into consideration. I might need a new column MONTH and even YEAR. I need to have the count of letters sent set to 3 per month. Here, for the month of June 4 letters were sent but i need the count to stay at 3 for every month. This number should be configurable.

Required new output:

ID    Name  Month   Year   letter_count
23    Dan    06     2019         3
23    Dan    12     2019         1
24    Bob    06     2019         2

Solution

  • You can apply clip after doing a groupby to keep the count at 3:

    # Add Year/Month and check if Sent columns
    df['DATE'] = df.DATE.astype('datetime64')
    df.assign(Year=df.DATE.dt.strftime('%Y'), \
              Month=df.DATE.dt.strftime('%m'), \
              Sent=df['Status'].eq('Sent'))
    
    # Your data should look like this at this point:    
    
       ID  ID2       DATE Status  Year Month  Sent
    0  23   10 2019-06-11   Sent  2019    06  True
    1  23   20 2019-06-21   Sent  2019    06  True
    2  23   30 2019-06-26   Sent  2019    06  True
    3  23   40 2019-06-27   Sent  2019    06  True
    4  23   50 2019-12-02   Sent  2019    12  True
    5  24   55 2019-06-27   Sent  2019    06  True
    6  24   65 2019-06-29   Sent  2019    06  True
    
    # Apply the groupby and clip:
    new_df = df.groupby(['ID', 'Year', 'Month'])['Sent'].count().clip(upper=3).reset_index()
    
    # Merge back the names:
    new_df = new_df.merge(df_name, how='left')
    
    # Which gives you:
       ID  Year Month  Sent Name
    0  23  2019    06     3  Dan
    1  23  2019    12     1  Dan
    2  24  2019    06     2  Bob
    

    I still wonder if this answers the true purpose of this exercise though. At the end of the day, you're just lying to yourself in the summarized frame.