Search code examples
rgroup-bystandard-deviation

R Standard deviation across columns and rows by id


I have several data frames that look similar to the following data frame (with much more columns):

id col1 col2 col3 col4 col5
1   4    3    5    4    A
1   3    5    4    9    Z
1   5    8    3    4    H
2   6    9    2    1    B
2   4    9    5    4    K
3   2    1    7    5    J
3   5    8    4    3    B
3   6    4    3    9    C

I want to calculate the standard deviation across specific columns (let's say col2 to col4) grouped by the id. I do not know the column index in every data frame. I only know the names for the columns I want to calculate the standard deviation for.

Is there a way I could do that easily? My original data frames contain around 20 columns and I only want the standard deviation for 10 columns with specific column names grouped by the id.

On top, it would be nice if I can directly add the calculated standard deviations to my data frame as a new column according to the id, looking like this:

id col1 col2 col3 col4 col5 SD
1   4    3    5    4    A   SD1
1   3    5    4    9    Z   SD1
1   5    8    3    4    H   SD1
2   6    9    2    1    B   SD2
2   4    9    5    4    K   SD2
3   2    1    7    5    J   SD3
3   5    8    4    3    B   SD3
3   6    4    3    9    C   SD3

Solution

  • You can try :

    library(dplyr)
    df %>%
      group_by(id) %>%
      mutate(SD = sd(unlist(select(cur_data(), col2:col4))))
    
    #    id  col1  col2  col3  col4 col5     SD
    #  <int> <int> <int> <int> <int> <chr> <dbl>
    #1     1     4     3     5     4 A      2.12
    #2     1     3     5     4     9 Z      2.12
    #3     1     5     8     3     4 H      2.12
    #4     2     6     9     2     1 B      3.41
    #5     2     4     9     5     4 K      3.41
    #6     3     2     1     7     5 J      2.62
    #7     3     5     8     4     3 B      2.62
    #8     3     6     4     3     9 C      2.62