Search code examples
rmatrixsimilarity

Creating a similarity measure that is weighted by column range


Consider the following matrix:

structure(list(X1 = c(1L, 2L, 3L, 4L, 2L, 5L), X2 = c(2L, 3L, 
4L, 5L, 3L, 6L), X3 = c(3L, 4L, 4L, 5L, 3L, 2L), X4 = c(2L, 4L, 
6L, 5L, 3L, 8L), X5 = c(1L, 3L, 2L, 4L, 6L, 4L)), .Names = c("X1", 
"X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA, 
-6L))

Each column corresponds to a respondent and each line contains the rank number that respondents assigned to a specific object. Notice that the range of the ranking may be different from respondent to respondent.

I am trying to create a similarity measure that weights distances based on the range of each column. Here is what I have tried so far:

m <- test
d <- dist(m, "manhattan", diag=FALSE, upper=TRUE)/nrow(m) 
maxmin <- max(m, na.rm=TRUE) - min(m,na.rm=TRUE)
WeightedAgreement <- as.matrix((-1 * d + maxmin) / maxmin)

With this measure, the distance between X1 and X3 = 0.761 since ((1.666 * - 1)+7)/7 = 0.761.

The problem with my formula is that it is using the range of all values in the table -- thus "maxmin" is always 7, which biases the calculation of similarities. I need to use the range of the columns rather than the table when calculating similarities. The maxmin value of columns 1 and 3 should be 4 (5-1) and the similarity between X1 and X3 should be 0.583.


Solution

  • If I understand correctly, I think you want to define maxmin as follows:

    maxmin <- outer(names(m), names(m),
                    Vectorize(function(i,j) max(m[c(i,j)], na.rm = TRUE) -
                                            min(m[c(i,j)], na.rm = TRUE)))
    
    #      [,1] [,2] [,3] [,4] [,5]
    # [1,]    4    5    4    7    5
    # [2,]    5    4    4    6    5
    # [3,]    4    4    3    6    5
    # [4,]    7    6    6    6    7
    # [5,]    5    5    5    7    5