Search code examples
rapplysapply

R: Inserting Mid Values of Data Frame Row Pairs


I have a series of coordinates from Strava which are recorded every 2.5 minutes and then add these to a QGIS map. I want to interpolate the points in between by take the mean of the latitude and longitude of each pair.

I know I could use a for loop, but I"d rather use one of the apply family of functions. I know I need to take the current row and then next row for all but the last row.

gpsSmall is a data.frame looks like this

activity_no lat     lon
----------- ---     ---
1           52.5111 -1.85222
1           52.5111 -1.86224
1           52.5111 -1.87226
... etc
2           52.6189 -1.85332
2           52.6284 -1.86332
2           52.6386 -1.87332
... etc

I've then written these functions to create the extra rows which I will rbind onto the end.

splitPoints <- function(point1, point2) {
    meanLatitude = (point1$lat + point2$lat)/2
    meanLongitude = (point1$lon + point2$lon)/2

    point1$lat = meanLatitude
    point1$lon = meanLongitude

    point1
}

newPoints <- sapply(seq_len(nrow(gpsSmall) - 1),
       function(i){
           splitPoints(gpsSmall[i,], gpsSmall[i+1,])
       })

However, newPoints returns a matrix of 3 (the number columns in gpsSmall) x 66 (1 - the number of rows in gpsSmall). What am I doing wrong?


Solution

  • Not using the apply functions, but something like this may make it a bit easier. Given what I think your problem is, this should do it. I assumed you wanted activity_no to be a grouping mechanism. If not, it's even easier. Just use the approx function as done below on the whole data set instead of splitting it first.

    A couple of tidyverse packages:

    library(dplyr)
    library(purrr)
    

    Load your data snippet:

    dat <- tribble(
      ~activity_no, ~lat, ~lon,
      1,           52.5111, -1.85222,
      1,           52.5111, -1.86224,
      1,           52.5111, -1.87226,
      2,           52.6189, -1.85332,
      2,           52.6284, -1.86332,
      2,           52.6386, -1.87332
    )
    

    And now just do linear interpolation using ?approx. Setting the length of the interpolation output to n * 2 - 1 basically says there is 1 new value in between each real observation. Since it is linear, that will be the mean. You could tweak the output and get a greater level of interpolation if you wanted it.

    dat %>%
      split(dat$activity_no) %>%
      map_dfr( ~ data.frame(activity_no = rep(.$activity_no[1], nrow(.) * 2 - 1),
                    lat = approx(.$lat, n = nrow(.) * 2 - 1)$y,
                    lon = approx(.$lon, n = nrow(.) * 2 - 1)$y))
    
       activity_no      lat      lon
    1            1 52.51110 -1.85222
    2            1 52.51110 -1.85723
    3            1 52.51110 -1.86224
    4            1 52.51110 -1.86725
    5            1 52.51110 -1.87226
    6            2 52.61890 -1.85332
    7            2 52.62365 -1.85832
    8            2 52.62840 -1.86332
    9            2 52.63350 -1.86832
    10           2 52.63860 -1.87332