Search code examples
rranking

R: how to rank longitudinal data


> dput(subset)
structure(list(MEMORY1 = c(1L, 1L, 1L, 1L, 2L), MEMORY2 = c(1L, 
1L, 1L, 1L, 1L), MEMORY3 = c(1L, 2L, 1L, 1L, 1L), MEMORY4 = c(2L, 
2L, 2L, 2L, 2L), MEMORY5 = c(1L, 2L, 1L, 2L, 1L), MEMORY6 = c(1L, 
1L, 2L, 1L, 2L), MEMORY7 = c(2L, 2L, 2L, 2L, 1L), MEMORY8 = c(1L, 
1L, 1L, 1L, 1L)), .Names = c("MEMORY1", "MEMORY2", "MEMORY3", 
"MEMORY4", "MEMORY5", "MEMORY6", "MEMORY7", "MEMORY8"), row.names = c(NA, 
-5L), class = "data.frame")

> subset
  MEMORY1 MEMORY2 MEMORY3 MEMORY4 MEMORY5 MEMORY6 MEMORY7 MEMORY8
1       1       1       1       2       1       1       2       1
2       1       1       2       2       2       1       2       1
3       1       1       1       2       1       2       2       1
4       1       1       1       2       2       1       2       1
5       2       1       1       2       1       2       1       1

My data has 8 items (columns) recorded at 5 time intervals (rows). I would like to rank the data as follows: 1) if column has all 1s, then the column gets rank 8. 2) rank of the column is dependent upon when a number greater than 1 first appears (for MEMORY1 it would be 5, MEMORY3 is 2, MEMORY4 is 1, and so forth). I wrote the following loop to do this.

ranks = rep(0, 8)
for(i in 1:8){
  v = which(subset[i] > 1)
  if(length(v) == 0){
    ranks[i] = 8
  }else ranks[i] = v[1]
}
> ranks
[1] 5 8 2 1 2 3 1 8

Works fine but I realized that since there are ties, i.e, MEMORY4 and MEMORY7 are both ranked as 1, then I would want MEMORY3 and MEMORY5 to be ranked as 3 instead of 2. In that case MEMORY6 should be ranked as 5, not 3. So the desired ranking should be.

6 8 3 1 3 5 1 8


Solution

  • One option would be to loop through the columns of 'df1' using sapply and get the first position where the value is greater than 1. If there are no values that are greater than 1, it will be NA. Then, we get the rank of the 'indx' specifying the ties.method as min ('indx1'). The position of NA values in 'indx' is replaced by 8 as the last step.

     indx <- sapply(df1, function(x) which(x>1)[1L])
     indx1 <- as.vector(rank(indx, ties.method='min'))
     indx1[is.na(indx)] <- 8
     indx1
     #[1] 6 8 3 1 3 5 1 8