I have the following dataframe (df) and would like to interpolate Lat, Lon coordinates at an equidistant interval (e.g. every 250 m) or time interval (e.g. every 2 min).
> head(df)
ID Latitude Longitude trip date.time
1 1 10.30447 -109.2323 1 2005-01-07 11:25:26
2 1 10.30425 -109.2321 1 2005-01-07 11:25:36
3 1 10.30314 -109.2326 1 2005-01-07 11:25:46
4 1 10.30199 -109.2328 1 2005-01-07 11:25:56
5 1 10.30079 -109.2334 1 2005-01-07 11:26:06
6 1 10.30006 -109.2331 1 2005-01-07 11:26:16
I tried to do this using R package zoo and the following code I found in a similar question posted:
full.time <- with(df,seq(date.time[1],tail(date.time,1),by=1))
library(zoo)
df.zoo <- zoo(df[,3:4],df$date.time) # convert to zoo object
result <- na.approx(df.zoo,xout=full.time) # interpolate; result is also a zoo object
head(result)
However, as my dataframe includes multiple trips (df$trip) of multiple individuals (df$ID), I get the following error message:
> df.zoo <- zoo(df[,3:4],df$date.time) # convert to zoo object
Warning message:
In zoo(df[, 3:4], df$datetime) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
How can I run above code (in a loop?) accounting for individual trips?
Your sample is not representative: you ask for interpolation in 2 min increments, but the data-set spans < 2 min. So in this example I use 30 sec. increments. Also, you only provide 1 ID/type combination so it is impossible to verify that this works as you want. Nevertheless it should.
There are several ways to do this; I find data.table to be the most convenient - and it will definitely be fastest.
df$date.time <- as.POSIXct(df$date.time) # make sure date.time is POSIXct
library(data.table)
interp.time <- function(var,dt) approx(dt,var,xout=seq(min(dt),max(dt),by="30 sec"))$y
result <- setDT(df)[,lapply(.SD,interp.time,dt=date.time),
by=list(ID,trip),
.SDcols=c("Latitude","Longitude","date.time")]
result[,date.time:=as.POSIXct(date.time, origin="1970-01-01")]
result
# ID trip Latitude Longitude date.time
# 1: 1 1 10.30447 -109.2323 2005-01-07 11:25:26
# 2: 1 1 10.30199 -109.2328 2005-01-07 11:25:56
Doing this for distance is a bit more complicated because of course we can't use Euclidean distance on lon/lat data. The solution below uses distHaversine(...)
in the geotools
package to calculate cumulative Haversine distances, and then interpolates on that. Here we use 50m instead of 250m.
library(geosphere) # for distHaversine
get.dist <- function(lon, lat) distHaversine(tail(cbind(lon,lat),-1),head(cbind(lon,lat),-1))
df[,dist:=c(0,cumsum(get.dist(Longitude,Latitude))),by=list(ID,trip)]
interp.dist <- function(var,dist) approx(dist,var,xout=seq(min(dist),max(dist),by=50))$y
result <- setDT(df)[,lapply(.SD,interp.dist,dist=dist),
by=list(ID,trip),
.SDcols=c("Latitude","Longitude","dist")]
# plot the result
plot(Latitude~Longitude,df, pch=20, asp=1)
lines(Latitude~Longitude,df, col="blue")
points(Latitude~Longitude,result, col="red")
lines(Latitude~Longitude,result, col="red")
Note that you have to set the aspect ration of the plot to 1:1 or the distances are distorted.