I am trying to calculate driver activity using GPS data. I've written a loop that calculates the difference in time between two consecutive points in a dataframe over the range of values, summing it as it goes.
Here is an example of my data:
DriveNo Date.and.Time Latitude Longitude
1 156 2014-01-31 23:00:00 41.88367 12.48778
2 187 2014-01-31 23:00:01 41.92854 12.46904
3 297 2014-01-31 23:00:01 41.89107 12.49270
4 89 2014-01-31 23:00:01 41.79318 12.43212
5 79 2014-01-31 23:00:01 41.90028 12.46275
6 191 2014-01-31 23:00:02 41.85231 12.57741
Reprex:
taxi_noOutlier <- structure(list(DriveNo = c(156, 187, 297, 89, 79, 191),
Date.and.Time = structure(c(1391209200.73917, 1391209201.14846,
1391209201.22007, 1391209201.47085, 1391209201.63114, 1391209202.04855),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Latitude = c(41.883670807, 41.928543091, 41.891067505, 41.793178558,
41.900276184, 41.852306366),
Longitude = c(12.48777771, 12.469037056, 12.492704391, 12.432122231,
12.46274662, 12.577406883)),
row.names = c(NA, 6L), class = "data.frame")
And the loop:
taxi_156 <- filter(taxi_noOutlier, DriveNo == 156)
datelist = taxi_156$Date.and.Time
dlstandard = as.POSIXlt(datelist)
diffsum <- as.numeric(sum(Filter(function(x) x <= 60, difftime(tail(dlstandard, -1), head(dlstandard, -1), units = 'secs'))))
print(paste("The total activity time for driver #156 is ", diffsum))
Which gives an output of:
[1] "The total activity time for driver #264 is 705655.37272048"
My question is, how can I expand this code to find the activity for each other driver? (There are 374 unique drivers, each with thousands of points.) I have tried to replicate the above code using a loop that would calculate the time difference for each DriveNo, but I am new to R and I my understanding of loop syntax isn't great.
Can I filter into separate dataframes using a method like this? (This gives an error to do with unexpected bracketing).
for (i in seq_along(taxi_noOutlier$DriveNo))
{
taxi_[[i]] <- filter(taxi_noOutlier, DriveNo == [[i]])
}
and then use my original code on each one? Or is there a more efficient way? Thanks
You can group_by
each DriveNo
get the difference between consecutive Date.and.Time
, remove the values which are less than a minute and sum
the differences.
library(dplyr)
taxi_noOutlier %>%
group_by(DriveNo) %>%
mutate(difftime = difftime(Date.and.Time, lag(Date.and.Time), units = 'secs')) %>%
filter(difftime <= 60) %>%
summarise(diffsum = sum(as.numeric(difftime), na.rm = TRUE)) -> result
result