I am trying to translate the reference implementation of AP Clustering into C++. This sequence of statements baffles me because it looks as if vector c gets filled with some numbers, and then promptly overwritten with a different set of numbers. Which makes no sense. Here is the Matlab code:
[tmp c]=max(S(:,I),[],2); c(I)=1:K;
The R implementation has something very similar:
c <- max.col(s[, I], ties.method="first")
c[I] <- 1:K
It is true that s is NxN while I is of length K << N. However as I read it, c is the same size as I, so that the final value is just a permutation of 1:K that depends on I but not on the result of the first statement.
I thought I knew what each statement does, but the combination is a mystery. Please set me straight.
Speaking about the R version: if
s is NxN while I is of length K << N.
then
s[,I]
has dimensions N (rows) x K (columns)?max.col
, max.col
"[f]ind[s] the maximum position for each row of a matrix" (emphasis added)c
is of length N
(i.e., equal to the number of rows of s[,I
)If I
is of length K
, then assigning values to c[I]
will fill in only K
of the N
values of c
, leaving the other N-K
equal to their original values.
R does vectorized assignment, so c[I] <- 1:K
is equivalent to (but faster than)
for (j in seq(K)) {
c[I[j]] <- j
}