I have the given data - all bike trips that started from a particular station over the month of October 2013. I'd like to count the amount of trips that occurred within ten-minute time intervals. There should be a total of 144 rows with a sum of all of the trips that occurred within that interval for the entire month. How would one cut the data.frame and then aggregate by interval (so that trips occurring between 00:00:01 and 00:10:00 are counted in the second row, between 00:10:01 and 00:20:00 are counted in the third row, and so on...)?
head(one.station) tripduration starttime stoptime start.station.id start.station.name 59 803 2013-10-01 00:11:49 2013-10-01 00:25:12 521 8 Ave & W 31 St 208 445 2013-10-01 00:40:05 2013-10-01 00:47:30 521 8 Ave & W 31 St 359 643 2013-10-01 01:25:57 2013-10-01 01:36:40 521 8 Ave & W 31 St 635 388 2013-10-01 05:30:30 2013-10-01 05:36:58 521 8 Ave & W 31 St 661 314 2013-10-01 05:38:00 2013-10-01 05:43:14 521 8 Ave & W 31 St 768 477 2013-10-01 05:54:49 2013-10-01 06:02:46 521 8 Ave & W 31 St start.station.latitude start.station.longitude end.station.id end.station.name 59 40.75045 -73.99481 2003 1 Ave & E 18 St 208 40.75045 -73.99481 505 6 Ave & W 33 St 359 40.75045 -73.99481 508 W 46 St & 11 Ave 635 40.75045 -73.99481 459 W 20 St & 11 Ave 661 40.75045 -73.99481 462 W 22 St & 10 Ave 768 40.75045 -73.99481 457 Broadway & W 58 St end.station.latitude end.station.longitude bikeid usertype birth.year gender 59 40.73416 -73.98024 15139 Subscriber 1985 1 208 40.74901 -73.98848 20538 Subscriber 1990 2 359 40.76341 -73.99667 19935 Customer \\N 0 635 40.74674 -74.00776 14781 Subscriber 1955 1 661 40.74692 -74.00452 17976 Subscriber 1982 1 768 40.76695 -73.98169 19022 Subscriber 1973 1
So that the output looks like this
output interval total_trips 1 00:00:00 0 2 00:10:00 1 3 00:20:00 2 4 00:30:00 3 5 00:40:00 4
Here it is using only start time:
library(lubridate)
library(dplyr)
tripduration <- floor(runif(6) * 1000)
start_times <- as.POSIXlt(
c("2013-10-01 00:11:49"
,"2013-10-01 00:40:05"
,"2013-10-01 01:25:57"
,"2013-10-01 05:30:30"
,"2013-10-01 05:38:00"
,"2013-10-01 05:54:49")
)
time_bucket <- start_times - minutes(minute(start_times) %% 10) - seconds(second(start_times))
df <- data.frame(tripduration, start_times, time_bucket)
summarized <- df %>%
group_by(time_bucket) %>%
summarize(trip_count = n())
summarized <- as.data.frame(summarized)
out_buckets <- data.frame(out_buckets = seq(as.POSIXlt("2013-10-01 00:00:00"), as.POSIXct("2013-10-01 06:0:00"), by = 600))
out <- left_join(out_buckets, summarized, by = c("out_buckets" = "time_bucket"))
out$trip_count[is.na(out$trip_count)] <- 0
out
out_buckets trip_count
1 2013-10-01 00:00:00 0
2 2013-10-01 00:10:00 1
3 2013-10-01 00:20:00 0
4 2013-10-01 00:30:00 0
5 2013-10-01 00:40:00 1
6 2013-10-01 00:50:00 0
7 2013-10-01 01:00:00 0
8 2013-10-01 01:10:00 0
9 2013-10-01 01:20:00 1
10 2013-10-01 01:30:00 0
11 2013-10-01 01:40:00 0
12 2013-10-01 01:50:00 0
13 2013-10-01 02:00:00 0
14 2013-10-01 02:10:00 0
15 2013-10-01 02:20:00 0
16 2013-10-01 02:30:00 0
17 2013-10-01 02:40:00 0
18 2013-10-01 02:50:00 0
19 2013-10-01 03:00:00 0
20 2013-10-01 03:10:00 0
21 2013-10-01 03:20:00 0
22 2013-10-01 03:30:00 0
23 2013-10-01 03:40:00 0
24 2013-10-01 03:50:00 0
25 2013-10-01 04:00:00 0
26 2013-10-01 04:10:00 0
27 2013-10-01 04:20:00 0
28 2013-10-01 04:30:00 0
29 2013-10-01 04:40:00 0
30 2013-10-01 04:50:00 0
31 2013-10-01 05:00:00 0
32 2013-10-01 05:10:00 0
33 2013-10-01 05:20:00 0
34 2013-10-01 05:30:00 2
35 2013-10-01 05:40:00 0
36 2013-10-01 05:50:00 1
37 2013-10-01 06:00:00 0