Search code examples
rloopsmedian

Compute median per column in loop


I have this loop to compute the mean per column, which works.

for (i in 1:length(DF1)) {     
    tempA <- DF1[i]                                 # save column of DF1 onto temp variable 
    names(tempA) <- 'word'                          # label temp variable for inner_join function
    DF2 <- inner_join(tempA, DF0, by='word')        # match words with numeric value from look-up DF0
    tempB <- as.data.frame(t(colMeans(DF2[-1])))    # compute mean of column
    DF3<- rbind(tempB, DF3)                         # save results togther
}

The script uses the dplyr package for inner_join.

  • DF0 is the look-up database with 3 columns (word, value1, value2, value3).
  • DF 1 is the text data with one word per cell.
  • DF3 is the output.

Now I want to compute the median instead of the mean. It seemed easy enough with the colMedians function from 'robustbase', but I can't get the below to work.

library(robustbase)

for (i in 1:length(DF1)) {     
    tempA <- DF1[i]
    names(tempA) <- 'word'
    DF2 <- inner_join(tempA, DF0, by='word')
    tempB <- as.data.frame(t(colMedians(DF2[-1])))
    DF3<- rbind(tempB, DF3) 
}

The error message reads:

Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.

I've tried to format DF2 as a matrix prior to the colMedians function, but still get the error message:

Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.

I don't understand what is going on here. Thanks for the help!

Happy to provide sample data and error traceback, but trying to keep it as crisp and simple as possible.


Solution

  • According to the comment by the OP, the following solved the problem.
    I have added a call to library(dplyr).
    My contribution was colMedians(data.matrix(DF2[-1]), na.rm = TRUE).

    library(robustbase)
    library(dplyr)
    
    for (i in 1:length(DF1)) {     
        tempA <- DF1[i]
        names(tempA) <- 'word'
        DF2 <- inner_join(tempA, DF0, by='word')
        tempB <- colMedians(data.matrix(DF2[-1]), na.rm = TRUE)
        DF3 <- rbind(tempB, DF3) 
    }