Search code examples
rif-statementmedian

Calculate median for each subject with update on ties?


I have data which looks like this (this is test data for illustration):

test <- matrix(c(1, 1, 1, 2, 2, 2 , 529, 528, 528, 495, 525, 510,557, 535, 313,502,474, 487 ), nr=6, dimnames=list(c(1,2,3,4,5,6),c("subject", "rt1", "rt2")))

And I need to turn it into this:

test2<-matrix(c(1,1,1,2,2,2,529,528,528,495,525,510,"slow","slow","fast","fast","slow","slow",557, 535, 313,502,474, 487,"fast","fast","slow","slow","fast","fast"), nr=6, dimnames=list(c(1,2,3,4,5,6),c("subject", "rt1","speed1", "rt2","speed2")))

The speed1 column is calculated thus: calculate the median rt1 for the subject. If the individual value is less than the median it scores fast. If the individual cell value of rt1 is more than the median it scores slow. If the cell value is at the median, the cell is removed from the analysis (delete or NA) and the median for that subject is recalculated. This process is repeated for the speed2 column, but using rt2.

Perhaps some kind of if statement?

To clarify: I want the median for each subject (there are 40 in total) and for any values that are at the median (for that subject) to be excluded and the median recalculated (for that subject).


Solution

  • Following on from John's answer, to do per subject medians, use tapply:

    test2 <- data.frame(test)
    test2$subject <- factor(test2$subject)
    test3 <- data.frame(subject=levels(test2$subject),median.rt1=tapply(test2$rt1,test2$subject,median),median.rt2=tapply(test2$rt2,test2$subject,median))
    test2 <- merge(test2,test3)
    test2$speed1 <- ifelse(test2$rt1 < test2$median.rt1, 'fast', 'slow') 
    test2$speed2 <- ifelse(test2$rt2 < test2$median.rt2, 'fast', 'slow')
    

    To remove the values at the median you can use,

    subset(test2,!(rt1==median.rt1 | rt2==median.rt2))
    

    Or some tolerance based test if you are expecting numerical representation error to cause problems with the straight equality test. You can then run the tapply and merge lines again (though maybe subsetting away the original median columns) to calculate new medians, and redo the speed classifications should you want to. Personally I would use a nested ifelse to classify as fast, slow or average though.