I am new in coding and I work with positive COVID cases. I have a DataFrame with all the positive (e.g case 1 : positive : 2020-04-30) and I have to make the sum of positive cases per day in a numeric way (i.g 3 positive cases : 2020-04-30) in order to create a plot, but I don't know how can I do his with python, I know that I have to use the function groupby()
and sum ()
but I don't know how, can you help me please ?
Thank you !
Here is a table which represent my DataFrame
Here is a table of what I want to have as result
My actual data are confidential but I give you a sample of them my actual data
you should aggregate by column and then sum the results, try this:
Notice that, the patient name should should have a numerical counter for keeping track.
import pandas as pd
import datetime
import numpy as np
# this a dummy set, you should have already this in your data frame
dict_df = {'Patient': [1,2,3,4,5], 'Positive': ['Positive'] * 5, 'Date': [datetime.date(2020, 4, 21), datetime.date(2020, 4, 22), datetime.date(2020, 4, 21), datetime.date(2020, 4, 23), datetime.date(2020, 4, 22)]}
df = pd.DataFrame(dict_df)
# create a numerics counter
cases = df['Positive'].to_numpy()
counter = np.where(cases == 'Positive', 1, 0)
# add column to data frame
df['counter'] = counter
# use groupby and sum
results = df['counter'].groupby(df['Date']).sum()
print(results)
#Results
#Date Cases
#2020-04-21 2
#2020-04-22 2
#2020-04-23 1