Search code examples
pythonpandasplotcumulative-sum

Cumulative plot from a Dataframe with only DateTime information


I have a csv file like this:

2021-01-05 10:57:12.762000, REDDE EHZ AM 00, trigger
2021-01-05 10:58:26.622000, REDDE EHZ AM 00, trigger
2021-01-05 11:02:16.772000, REDDE EHZ AM 00, trigger
2021-01-05 11:02:34.042000, REDDE EHZ AM 00, trigger
2021-01-05 17:12:07.221999, REDDE EHZ AM 00, trigger
2021-01-06 01:42:45.501999, REDDE EHZ AM 00, trigger
2021-01-06 01:44:24.481999, REDDE EHZ AM 00, trigger
2021-01-06 01:44:58.051999, REDDE EHZ AM 00, trigger
2021-01-06 01:45:14.871999, REDDE EHZ AM 00, trigger
2021-01-06 01:47:10.901999, REDDE EHZ AM 00, trigger
2021-01-06 07:57:33.221999, REDDE EHZ AM 00, trigger
2021-01-06 07:57:48.821999, REDDE EHZ AM 00, trigger
2021-01-06 07:58:51.031999, REDDE EHZ AM 00, trigger
2021-01-06 07:59:27.001999, REDDE EHZ AM 00, trigger
2021-01-06 08:00:56.871999, REDDE EHZ AM 00, trigger
2021-01-06 11:28:17.191999, REDDE EHZ AM 00, trigger
2021-01-06 11:28:46.201999, REDDE EHZ AM 00, trigger
2021-01-06 11:29:19.111999, REDDE EHZ AM 00, trigger
2021-01-06 11:29:41.891999, REDDE EHZ AM 00, trigger
2021-01-06 11:30:51.901999, REDDE EHZ AM 00, trigger
2021-01-06 11:31:21.921999, REDDE EHZ AM 00, trigger
2021-01-06 11:32:23.001999, REDDE EHZ AM 00, trigger
2021-01-06 11:32:58.271999, REDDE EHZ AM 00, trigger
2021-01-07 11:33:46.891999, REDDE EHZ AM 00, trigger
2021-01-07 12:38:50.021999, REDDE EHZ AM 00, trigger
2021-01-07 12:39:53.881999, REDDE EHZ AM 00, trigger
2021-01-08 12:42:07.371999, REDDE EHZ AM 00, trigger
2021-01-08 12:42:46.441999, REDDE EHZ AM 00, trigger
2021-01-09 12:44:14.291999, REDDE EHZ AM 00, trigger

I added the header with:

df = pd.read_csv(r'D:\Inves\SM\CC_Cbba\REDPy\OSCREDDE_3_\redde_3_trigs.dat',
                 sep=',', header=None, usecols=[0, 1, 2])
headers =  ["TrigDT", "Sta", "Type"]

The output is:

                       TrigDT               Sta      Type
0  2021-01-05 10:57:12.762000   REDDE EHZ AM 00   trigger
1  2021-01-05 10:58:26.622000   REDDE EHZ AM 00   trigger
2  2021-01-05 11:02:16.772000   REDDE EHZ AM 00   trigger
3  2021-01-05 11:02:34.042000   REDDE EHZ AM 00   trigger
4  2021-01-05 17:12:07.221999   REDDE EHZ AM 00   trigger
...

I created a Date colum in roder to try to group the information by days:

df['TrigDT'] = pd.to_datetime(df['TrigDT'])
df['Date'] = df['TrigDT'].dt.date

I tried to have a cumulative sum with Index because I do not have a colum with counter of events, then I tried to group by days but I fail:

df = df.groupby('Date').index.sum()
df = df.groupby(df.index.day).cumsum().reset_index()

The idea is to create a cumulative plot with the DataFrame information (X axis with dates a Y axis with cumulative information), I tried to have a plot like https://stackoverflow.com/questions/53895480/python-plot-timedelta-and-cumulative-values

Would you mind to give me some tips to reach the objective?, the expected output could be like this, in my case only with one station that is called REDDE:

enter image description here Thank you


Solution

  • Not sure I understood what a, b, c would be exactly in your chart, but here is how you can accumulate the number of triggers daily:

    # Load the data and set 'TrigDT' as the index
    df = pd.read_csv('path_to_file.csv', header=None, usecols=(0, 1, 2), names=('TrigDT', 'Sta', 'Type'))    
    df.TrigDT = pd.to_datetime(df.TrigDT)
    df.set_index('TrigDT', inplace=True)
    
    # Resample to daily and count
    daily_acc = df.resample('d').count().cumsum()                                                    
    daily_acc.plot()