I have this dataset df,
ID Name
23 Dan
24 Bob
This dataset shows relationship of each ID
ID ID2 DATE Status
23 10 2019-06-11 Sent
23 20 2019-06-21 Sent
23 30 2019-06-26 Sent
23 40 2019-06-27 Sent
23 50 2019-12-02 Sent
24 55 2019-06-27 Sent
24 65 2019-06-29 Sent
Here, ID 23 sent letters to ID's 10, 20, 30, 40, 50 on the mentioned DATE. I wanted to get the count of how many each ID sent letters to. I did something like this,
id = df.groupby(['ID'], as_index = False)
id_dict = {}
all_df = id.get_group(ID)
letter_count = 0
for index, row in all_df.iterrows():
if ((row['Status'] == 'Sent')):
letter_count = letter_count + 1
id_dict.update({ID:letter_count})
df['letter_count'] = df['ID'].map(id_dict)
I get this output on df,
ID Name letter_count
23 Dan 4
24 Bob 2
DATE is not taken into consideration. I might need a new column MONTH and even YEAR. I need to have the count of letters sent set to 3 per month. Here, for the month of June 4 letters were sent but i need the count to stay at 3 for every month. This number should be configurable.
Required new output:
ID Name Month Year letter_count
23 Dan 06 2019 3
23 Dan 12 2019 1
24 Bob 06 2019 2
You can apply clip
after doing a groupby
to keep the count at 3
:
# Add Year/Month and check if Sent columns
df['DATE'] = df.DATE.astype('datetime64')
df.assign(Year=df.DATE.dt.strftime('%Y'), \
Month=df.DATE.dt.strftime('%m'), \
Sent=df['Status'].eq('Sent'))
# Your data should look like this at this point:
ID ID2 DATE Status Year Month Sent
0 23 10 2019-06-11 Sent 2019 06 True
1 23 20 2019-06-21 Sent 2019 06 True
2 23 30 2019-06-26 Sent 2019 06 True
3 23 40 2019-06-27 Sent 2019 06 True
4 23 50 2019-12-02 Sent 2019 12 True
5 24 55 2019-06-27 Sent 2019 06 True
6 24 65 2019-06-29 Sent 2019 06 True
# Apply the groupby and clip:
new_df = df.groupby(['ID', 'Year', 'Month'])['Sent'].count().clip(upper=3).reset_index()
# Merge back the names:
new_df = new_df.merge(df_name, how='left')
# Which gives you:
ID Year Month Sent Name
0 23 2019 06 3 Dan
1 23 2019 12 1 Dan
2 24 2019 06 2 Bob
I still wonder if this answers the true purpose of this exercise though. At the end of the day, you're just lying to yourself in the summarized frame.