I have a pandas.Series
of time-stamped data - basically a sequence of events:
0 2012-09-05 19:28:52
1 2012-09-05 19:28:52
2 2012-09-05 19:44:37
3 2012-09-05 19:44:37
4 2012-09-05 20:04:53
5 2012-09-05 20:04:53
6 2012-09-05 20:12:59
7 2012-09-05 20:13:15
8 2012-09-05 20:13:15
9 2012-09-05 20:13:15
I'd like to create a pandas.TimeSeries
over a specific pandas.date_range
(e.g. 15min interval; pandas.date_range(start, end, freq='15T')
) which holds the count of events for each period. How can this be accomplished?
thanks, Peter
If you would use the timestamps of the events as index of the series instead of the data, resample can do this. In the example below, the index of series s are the timestamps and data is the event_id, basically the index of your series.
In [47]: s
Out[47]:
event_id
timestamp
2012-09-05 19:28:52 0
2012-09-05 19:28:52 1
2012-09-05 19:44:37 2
2012-09-05 19:44:37 3
2012-09-05 20:04:53 4
2012-09-05 20:04:53 5
2012-09-05 20:12:59 6
2012-09-05 20:13:15 7
2012-09-05 20:13:15 8
2012-09-05 20:13:15 9
resample (this method can also be used on a DataFrame) will give a new series with in this case 15min periods, the end time of a bucket (period) is used to refer to it (you can control this with the label arg).
In [48]: s.resample('15Min', how=len)
Out[48]:
event_id
timestamp
2012-09-05 19:30:00 2
2012-09-05 19:45:00 2
2012-09-05 20:00:00 0
2012-09-05 20:15:00 6