Search code examples
pythondataframegroup-bysumnumeric

How to sum positive cases by date in python


I am new in coding and I work with positive COVID cases. I have a DataFrame with all the positive (e.g case 1 : positive : 2020-04-30) and I have to make the sum of positive cases per day in a numeric way (i.g 3 positive cases : 2020-04-30) in order to create a plot, but I don't know how can I do his with python, I know that I have to use the function groupby() and sum () but I don't know how, can you help me please ?

Thank you !

Here is a table which represent my DataFrame

Here is a table of what I want to have as result

My actual data are confidential but I give you a sample of them my actual data


Solution

  • you should aggregate by column and then sum the results, try this:

    Notice that, the patient name should should have a numerical counter for keeping track.

    import pandas as pd
    import datetime
    import numpy as np
    
    # this a dummy set, you should have already this in your data frame
    dict_df = {'Patient': [1,2,3,4,5], 'Positive': ['Positive'] * 5, 'Date': [datetime.date(2020, 4, 21), datetime.date(2020, 4, 22), datetime.date(2020, 4, 21), datetime.date(2020, 4, 23), datetime.date(2020, 4, 22)]}
    df = pd.DataFrame(dict_df)
    
    # create a numerics counter
    cases = df['Positive'].to_numpy()
    counter = np.where(cases == 'Positive', 1, 0)
    
    # add column to data frame
    df['counter'] = counter
    
    # use groupby and sum
    results = df['counter'].groupby(df['Date']).sum()
    print(results)
    
    #Results
    #Date            Cases             
    #2020-04-21        2
    #2020-04-22        2
    #2020-04-23        1