Search code examples
rscaledplyr

subtract mean from every element dplyr


I want to demean all my columns using dplyr. I tried but failed using the "do()" command.

I basically want to replicate the following using easier dplyr commands:

tickers <- c(rep(1,10),rep(2,10))
df <- data.frame(cbind(tickers,rep(1:20),rep(2:21)))
colnames(df) <- c("tickers","col1","col2")
df %>%  group_by(tickers)
apply(df[,2:3],2,function(x) x - mean(x))

I am sure this can be done much better using dplyr.

Thanks!


Solution

  • If we are using dplyr, we can do this with mutate_each and use any of the methods mentioned in ?select to match the columns. Here, I am using matches which can take regular expression as pattern.

    library(dplyr)
    df %>%
        mutate_each(funs(.-mean(.)), matches('^col')) %>%
        select(-tickers)
    

    But this can be done also using base R:

    df[2:3]-colMeans(df[2:3])[col(df[2:3])]
    

    The colMeans output is a vector which can be replicated so that the lengths will be the same.