Search code examples
pythonpandasdateoopoperation

Sorting my data frame by date (d/m/y + hour: min: sec)


I am trying to sort the values of my columns depending on the date (d/m/y + hour: min: sec). Below I will show you an example of the format of the given data:

Initiator Price date
XXX 560 13/05/2020 11:05:35
Glovoapp 250 12/05/2020 13:07:15
Glovoapp 250 13/04/2020 12:09:25
expected output:
if the user selects a date from the 10/04/2020 | 00:00:00 to 15/05/2020 |00:00:00 :
Glovoapp: 500
XXX: 560

if the user selects a date from the 10/04/2020 00:00:00 to 01/05/2020 00:00:00: 
Glovoapp: 250

So far I am able to sum the prices depending on the initiators without the date filtering. Any suggestions on what I should do ?

 def sum_method(self):
        montant_init = self.data.groupby("Initiateur")["Montant (centimes)"].sum()
        print(montant_init)
        return montant_init

^ I use this method for the calculation. I hope I am clear enough and thanks.

Tried answer; please correct me:

class evaluation():

    def __init__(self, df):
        self.df = df

    # Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes.
    def in_range(actual, start, stop):
        return start <= actual <= stop

    def evaluate(self):
        user_start = input("Enter your start date (dd.mm.yyyy hour:min:second): ")
        user_stop = input("Enter your end date (dd.mm.yyyy hour:min:second): ")

        # creates series of True or False selecting proper rows.
        mask = self.df['Date'].apply(self.in_range, args=(user_start, user_stop))

        # Do the groupby and sum on only those rows.
        montant_init = self.df.loc[mask].groupby("Initiateur")["Montant (centimes)"].sum()
        print(montant_init)

output when printing: self.df.loc[mask]
Empty DataFrame
Columns: [Opération, Initiateur, Montant (centimes), Monnaie, Date, Résultat, Compte marchand, Adresse IP Acheteur, Marque de carte]
Index: []


Solution

  • The below works. There are two steps:

    1. Make a mask to select the right rows
    2. Then do the groupby and sum on only those rows

    Mask function:

    # Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes.
    def in_range(actual, start, stop):
        return start <= actual <= stop
    

    Then apply the mask and perform the groupby:

    # creates series of True or False selecting proper rows.
    mask = df['date'].apply(in_range, args=(user_start, user_stop))
    
    # Do the groupby and sum on only those rows.
    df2 = df.loc[mask].groupby('Initiator').sum()
    

    Note that user_start and user_stop should be the defined start and stop datetimes by the user.

    And you're done!


    UPDATE: to include the methods as part of a class:

    class evaluation():
    
        def __init__(self, df):
            self.df = df
    
        # Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes. Add 'self' as arg in method.
        def in_range(self, actual, start, stop):
            return start <= actual <= stop
    
        def evaluate(self):
            user_start = pd.to_datetime(input("Enter your start date (yyyy.mm.dd hour:min:second): "))
            user_stop = pd.to_datetime(input("Enter your end date (yyyy.mm.dd hour:min:second): "))
    
            # creates series of True or False selecting proper rows.
            mask = self.df['Date'].apply(self.in_range, args=(user_start, user_stop)) 
    
            # Do the groupby and sum on only those rows.
            amount_init = self.df.loc[mask].groupby("Initiator")["Price"].sum()
            print(amount_init)
    

    Then to instantiate an object of the new class:

    import pandas as pd
    import dateutil.parser as dtp
    import evaluation as eval # this is the class we just made
    
    data = {
        'Initiator': ['XXX', 'Glovoapp', 'Glovoapp'],
        'Price': [560, 250, 250],
        'Date': [dtp.parse('13/05/2020 11:05:35'), dtp.parse('12/05/2020 13:07:15'), dtp.parse('13/04/2020 12:09:25')]
    }
    
    df = pd.DataFrame(data)
    eval_obj = eval.evaluation(df)
    eval_obj.evaluate()