Search code examples
rcombn

Performing a function on all possible combinations of a subset of DF columns in R


I'd like to calculate the distance between row-wise pairs of lat/long coordinates. This is easily done with a variety of functions like earth.dist. Where I am stuck is that I'd like this to be part of a nightly data quality check process where the number of pairs changes. Each row is unique subject/person. Some days a few subjects could have four sets of coordinates, some days the largest might be three. Is there an elegant way to perform this calculate using, e.g., all of the possible combinations formed by:

combn(geototal, 2])

, where geototal is the number of coordinate sets on a given day, e.g. x = 4 for the set:

latitude.1, longitude.1, latitude.2, longitude.2, latitude.3, longitude.3 latitude.4, longitude.4.

My current loop looks like this but of course misses many possible combinations, esp. as X gets larger than 4.

x = 1; y = 2 
while(x <= geototal) 
{
  if (y > geototal) break;
  eval(parse(text = sprintf("df$distance%d_%d = earth.dist(longitude.%d,latitude.%d,longitude.%d,latitude.%d)", x, y, x, x, y, y)));
  x <- x + 1; 
  y <- y + 1;
}

Thank you for any thoughts on this!


Solution

  • Try something like this

    # Using a built in dataset from library(fossil)
    data(fdata.lats)
    df = fdata.lats@coords
    
    # Function to do calculate pairwise distance
    foo = function(df) {
      # Find the number of pairs
      n = nrow(df)
      # Find all combination
      l = t(combn(n, 2))
      # Loop over all combination and calculate distance, store output in vector
      t = apply(l, 1, function(x) {earth.dist(df[x,])})
      # Return the list of pairs and their distance, modify here if you want to print something instead
      cbind(l, t)
    }
    
    # Test run
    foo(df)
    
                        t
     [1,]  1  2  893.4992
     [2,]  1  3  776.3101
     [3,]  1  4 1101.1145
     [4,]  1  5 1477.4800
     [5,]  1  6  444.9052
     [6,]  1  7  456.5888
     [7,]  1  8 1559.4614
     [8,]  1  9 1435.2985
     [9,]  1 10 1481.0119
    [10,]  1 11 1152.0352
    [11,]  1 12  870.4960
    [12,]  2  3  867.2648
    [13,]  2  4  777.6345
    [14,]  2  5  860.9163
    ...