Search code examples
rdistancedistance-matrix

Calculate the weighted distance from a modified distance matrix


I got a modified distance matrix where I want to use the transformed (normalized) distance in the creation of a variable. Below, I have some code that produces an example data.

set.seed(12)

size <- sample(100:1000, 7)
var <- c("V3", "V4", "V5", "V6", "V7", "V8", "V9")
dist <- matrix(runif(100), nrow = 7, ncol = 7)
diag(dist) <- 0

df <- as.data.frame(cbind(var, size, dist))

This leads to a dataset looking like this:

  var size                  V3                V4                 V5                V6                V7                 V8                V9
1  V3  549                   0 0.264918377622962  0.787836347473785 0.439429325051606 0.941087544662878   0.97763589094393 0.774718186818063
2  V4  445  0.0228777434676886                 0 0.0978530396241695 0.669819295872003 0.693911424372345  0.197649595327675 0.394586439244449
3  V5  435 0.00832482660189271 0.457607151241973                  0 0.240883231163025 0.843702238984406  0.844225987326354 0.361513090785593
4  V6  346   0.392697197152302 0.540707547217607  0.217823043232784                 0 0.384644460165873 0.0950279189273715 0.421090044546872
5  V7  958   0.813880559289828 0.665679829893634  0.267943592974916 0.882756386883557                 0  0.381151003297418 0.322011524345726
6  V8  273    0.37624845537357 0.112698937533423  0.504767951788381 0.814063254510984  0.58848182996735                  0 0.552160830702633
7  V9  552   0.380812183720991  0.21836716751568  0.188586926786229 0.633264608215541 0.530477509833872  0.152623838977888                 0

The data consists of several variables indicating on the distance between the var and different points, where the column called V3, V4, and so on, is the other point, i.e. var == V4 distance to V5 is denoted by the column called V5. Size denotes the size.

What I want to do is to calculate the weighted sum of distance, where the distance is weighted according to the size of the other point. See the formula below: enter image description here

where Si is the size of unit i, (the variable is called size). Di is the normalized distance between one point (i.e. column var3, var4, var5...) to the i th point, and the summation is over all k units.

For example, Di can be the distance from the given point V3 to V4 (0.264918377622962), and then the Si is the size of var == V4 (i.e. 445)

How do I perform this calculation when my data looks like this?

Thanks!


Solution

  • Perhaps this is what you are looking for?

    Working column-wise, we divide the size of each point by its distance from the column representing the point in question (1:7). Obviously we exclude the diagonal. Summing the result gives us the weighted size for that point

    set.seed(12)
    
    size <- sample(100:1000, 7)
    var <- c("V3", "V4", "V5", "V6", "V7", "V8", "V9")
    dist <- matrix(runif(49), nrow = 7, ncol = 7)
    diag(dist) <- 0
    
    df <- as.data.frame(cbind(var, size, dist))
    
    df$WS <- sapply(seq(nrow(df)), 
             function(i) sum(as.numeric(as.character((df[[2]][-i]))) / 
                             as.numeric(as.character(df[[i + 2]][-i]))))
    
    df$WS
    #> [1] 75937.840 10052.202 13876.181  6011.826  4144.254 13099.493  7330.831
    

    Created on 2020-11-13 by the reprex package (v0.3.0)