Search code examples
rgpsinterpolationspatial-interpolation

interpolate missing lat, lon in dataframe with multiple trips per individual


I have the following dataframe (df) and would like to interpolate Lat, Lon coordinates at an equidistant interval (e.g. every 250 m) or time interval (e.g. every 2 min).

> head(df)
   ID Latitude Longitude  trip   date.time
1  1 10.30447 -109.2323    1 2005-01-07 11:25:26
2  1 10.30425 -109.2321    1 2005-01-07 11:25:36
3  1 10.30314 -109.2326    1 2005-01-07 11:25:46
4  1 10.30199 -109.2328    1 2005-01-07 11:25:56
5  1 10.30079 -109.2334    1 2005-01-07 11:26:06
6  1 10.30006 -109.2331    1 2005-01-07 11:26:16

I tried to do this using R package zoo and the following code I found in a similar question posted:

full.time    <- with(df,seq(date.time[1],tail(date.time,1),by=1))
library(zoo)
df.zoo <- zoo(df[,3:4],df$date.time)        # convert to zoo object
result <- na.approx(df.zoo,xout=full.time)  # interpolate; result is also a zoo object
head(result)

However, as my dataframe includes multiple trips (df$trip) of multiple individuals (df$ID), I get the following error message:

> df.zoo <- zoo(df[,3:4],df$date.time)        # convert to zoo object
Warning message:
In zoo(df[, 3:4], df$datetime) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique

How can I run above code (in a loop?) accounting for individual trips?


Solution

  • Your sample is not representative: you ask for interpolation in 2 min increments, but the data-set spans < 2 min. So in this example I use 30 sec. increments. Also, you only provide 1 ID/type combination so it is impossible to verify that this works as you want. Nevertheless it should.

    There are several ways to do this; I find data.table to be the most convenient - and it will definitely be fastest.

    df$date.time <- as.POSIXct(df$date.time)  # make sure date.time is POSIXct
    library(data.table)
    interp.time <- function(var,dt) approx(dt,var,xout=seq(min(dt),max(dt),by="30 sec"))$y
    result <- setDT(df)[,lapply(.SD,interp.time,dt=date.time), 
                         by=list(ID,trip), 
                         .SDcols=c("Latitude","Longitude","date.time")]
    result[,date.time:=as.POSIXct(date.time, origin="1970-01-01")]
    result
    #    ID trip Latitude Longitude           date.time
    # 1:  1    1 10.30447 -109.2323 2005-01-07 11:25:26
    # 2:  1    1 10.30199 -109.2328 2005-01-07 11:25:56
    

    Doing this for distance is a bit more complicated because of course we can't use Euclidean distance on lon/lat data. The solution below uses distHaversine(...) in the geotools package to calculate cumulative Haversine distances, and then interpolates on that. Here we use 50m instead of 250m.

    library(geosphere)    # for distHaversine
    get.dist <- function(lon, lat) distHaversine(tail(cbind(lon,lat),-1),head(cbind(lon,lat),-1))
    df[,dist:=c(0,cumsum(get.dist(Longitude,Latitude))),by=list(ID,trip)]
    
    interp.dist <- function(var,dist) approx(dist,var,xout=seq(min(dist),max(dist),by=50))$y
    result <- setDT(df)[,lapply(.SD,interp.dist,dist=dist), 
                        by=list(ID,trip), 
                        .SDcols=c("Latitude","Longitude","dist")]
    
    # plot the result
    plot(Latitude~Longitude,df, pch=20, asp=1)
    lines(Latitude~Longitude,df, col="blue")
    points(Latitude~Longitude,result, col="red")
    lines(Latitude~Longitude,result, col="red")
    

    Note that you have to set the aspect ration of the plot to 1:1 or the distances are distorted.