Search code examples
rmatlabcluster-analysis

Meaning of "[tmp c]=max(S(:,I),[],2); c(I)=1:K; " in MatLab


I am trying to translate the reference implementation of AP Clustering into C++. This sequence of statements baffles me because it looks as if vector c gets filled with some numbers, and then promptly overwritten with a different set of numbers. Which makes no sense. Here is the Matlab code:

[tmp c]=max(S(:,I),[],2); c(I)=1:K;

The R implementation has something very similar:

    c <- max.col(s[, I], ties.method="first")
    c[I] <- 1:K

It is true that s is NxN while I is of length K << N. However as I read it, c is the same size as I, so that the final value is just a permutation of 1:K that depends on I but not on the result of the first statement.

I thought I knew what each statement does, but the combination is a mystery. Please set me straight.


Solution

  • Speaking about the R version: if

    s is NxN while I is of length K << N.

    then

    • s[,I] has dimensions N (rows) x K (columns)
    • according to ?max.col, max.col "[f]ind[s] the maximum position for each row of a matrix" (emphasis added)
    • thus c is of length N (i.e., equal to the number of rows of s[,I)

    If I is of length K, then assigning values to c[I] will fill in only K of the N values of c, leaving the other N-K equal to their original values.

    R does vectorized assignment, so c[I] <- 1:K is equivalent to (but faster than)

    for (j in seq(K)) {
        c[I[j]] <- j
    }