Search code examples
pythonpandastimestampcumulative-sum

Cumulative sum of time from timestamps in pandas


I am trying to find the sum of time (in seconds) from timestamps in a given trace of events. The data input and output is in a Pandas DataFrame. How would it be possible to do that?

Example Input :

   CaseID                         Timestamps
        0   2016-01-01 09:51:15.304000+00:00    
        0   2016-01-01 09:53:15.352000+00:00    
        1   2016-01-01 09:51:15.774000+00:00    
        1   2016-01-01 09:51:47.392000+00:00    
        1   2016-01-01 09:52:15.403000+00:00        

I would like for the sum to be added cumulatively as well; disregarding minuscule differences such as the milliseconds.

Example Output:

Case ID       sum_time
      0              0                
      0            120
      1              0
      1             32
      1             60

Solution

  • This should should solve the problem,

    import numpy as np
    import pandas as pd
    
    # recreate original data
    ts = """\
    2016-01-01 09:51:15.304000+00:00
    2016-01-01 09:53:15.352000+00:00
    2016-01-01 09:51:15.774000+00:00
    2016-01-01 09:51:47.392000+00:00
    2016-01-01 09:52:15.403000+00:00""".split("\n")
    
    df = pd.DataFrame({"CaseID": [0, 0, 1, 1, 1],
                       "Timestamp": [pd.Timestamp(tmp) for tmp in ts]})
    
    
    # solve the problem
    
    def calc_csum(partial_frame):
        """
        Takes a data frame with a Timestamp column;
        Add new colum with cummulative sum.
        """
       
        # 1. create the difference array
        r = partial_frame.Timestamp.diff()
        
        # 2. fill the first value (NaT) with zero
        r[r.isna()] = pd.Timedelta(0)
        # 3. convert to seconds and use cumsum -> new column
        partial_frame["cs"] = np.cumsum(r.dt.total_seconds().values)
        return partial_frame
    
    # apply to each "sub frame" with same CaseID
    res = df.groupby("CaseID").apply(calc_csum)
    print(res)
    

    Result:

        CaseID                        Timestamp       cs
    0       0   2016-01-01 09:51:15.304000+00:00    0.000
    1       0   2016-01-01 09:53:15.352000+00:00  120.048
    2       1   2016-01-01 09:51:15.774000+00:00    0.000
    3       1   2016-01-01 09:51:47.392000+00:00   31.618
    4       1   2016-01-01 09:52:15.403000+00:00   59.629