Consider the following matrix:
structure(list(X1 = c(1L, 2L, 3L, 4L, 2L, 5L), X2 = c(2L, 3L,
4L, 5L, 3L, 6L), X3 = c(3L, 4L, 4L, 5L, 3L, 2L), X4 = c(2L, 4L,
6L, 5L, 3L, 8L), X5 = c(1L, 3L, 2L, 4L, 6L, 4L)), .Names = c("X1",
"X2", "X3", "X4", "X5"), class = "data.frame", row.names = c(NA,
-6L))
Each column corresponds to a respondent and each line contains the rank number that respondents assigned to a specific object. Notice that the range of the ranking may be different from respondent to respondent.
I am trying to create a similarity measure that weights distances based on the range of each column. Here is what I have tried so far:
m <- test
d <- dist(m, "manhattan", diag=FALSE, upper=TRUE)/nrow(m)
maxmin <- max(m, na.rm=TRUE) - min(m,na.rm=TRUE)
WeightedAgreement <- as.matrix((-1 * d + maxmin) / maxmin)
With this measure, the distance between X1 and X3 = 0.761 since ((1.666 * - 1)+7)/7 = 0.761.
The problem with my formula is that it is using the range of all values in the table -- thus "maxmin" is always 7, which biases the calculation of similarities. I need to use the range of the columns rather than the table when calculating similarities. The maxmin value of columns 1 and 3 should be 4 (5-1) and the similarity between X1 and X3 should be 0.583.
If I understand correctly, I think you want to define maxmin
as follows:
maxmin <- outer(names(m), names(m),
Vectorize(function(i,j) max(m[c(i,j)], na.rm = TRUE) -
min(m[c(i,j)], na.rm = TRUE)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 4 5 4 7 5
# [2,] 5 4 4 6 5
# [3,] 4 4 3 6 5
# [4,] 7 6 6 6 7
# [5,] 5 5 5 7 5