Search code examples
rarcgisr-sfr-spweighted

Make weight column based on site distance in R


Apologies if this a naive question but I could not make a solution. I have a dataframe with columns named site, and their coordinates (long, lat). I want to make a new column named weight based on site distances.

For example:

site <- c(1, 2, 3, 4, 5)
long <- c(119.5772, 123.7172, 126.4772, 122.7972, 122.3372)
lat <- c(-31.45806, -33.75806, -31.91806, -31.91806, -31.91806)

df <- data.frame(site, long, lat)

I want to add a weight column in the dataframe df according to geographic distance. In other word, I want to have a column named weight so that sites are weighted according to Ellipsoid distance. Thank you.

My desired output should be:

df  
  site     long       lat weight
1    1 119.5772 -31.45806  0.955
2    2 123.7172 -33.75806  0.855
3    3 126.4772 -31.91806  0.654
4    4 122.7972 -31.91806  0.358
5    5 122.3372 -31.91806  0.254

Note: In the weight column above I have put random numbers. The criteria should be such that nearest sites will get more weight than distant sites.


Solution

  • The distance matrix can be calculated as

    geosphere::distm(x = df[2:3])
    > geosphere::distm(x = df[2:3])      
             [,1]     [,2]     [,3]      [,4]      [,5]
    [1,]      0.0 464760.0 656073.1 309512.28 266596.37
    [2,] 464760.0      0.0 329233.1 221489.49 241514.75
    [3,] 656073.1 329233.1      0.0 348026.93 391525.30
    [4,] 309512.3 221489.5 348026.9      0.00  43505.42
    [5,] 266596.4 241514.7 391525.3  43505.42      0.00
    

    Now as per your comment below, you have calculated weight by following this strategy

    m <- geosphere::distm(x = df[2:3])      
    diag(m) <- NA
    df$mean <- apply(m, 1, mean, na.rm = T)
    df <- df[order(df$mean, decreasing = T),]
    df$order <- c(1:nrow(df))
    df$weight <- (df$order - min(df$order)/max(df$order)-min(df$order))
    df
    
      site     long       lat     mean order weight
    3    3 126.4772 -31.91806 431214.6     1   -0.2
    1    1 119.5772 -31.45806 424235.4     2    0.8
    2    2 123.7172 -33.75806 314249.3     3    1.8
    5    5 122.3372 -31.91806 235785.5     4    2.8
    4    4 122.7972 -31.91806 230633.5     5    3.8
    

    this can be, in my humble opinion, achieved by this

    library(dplyr)
    df %>% mutate(order = 1 + n - dense_rank(apply(distm(x = df[2:3]), 1, FUN = function(x){sum(x)/(n-1)})),
             weight = order - (1 + 1/n))
      site     long       lat order weight
    1    1 119.5772 -31.45806     2    0.8
    2    2 123.7172 -33.75806     3    1.8
    3    3 126.4772 -31.91806     1   -0.2
    4    4 122.7972 -31.91806     5    3.8
    5    5 122.3372 -31.91806     4    2.8
    

    The simple logic is your min(df$order) value will always be 1 and max(df$order) will always be equal to number of rows in your data frame.