Search code examples
rstandard-deviation

How do I create a row of SD values for specific columns of a data frame?


I have a 160 by 250,000 data frame. I want to find the SD of all columns except the first (249,999 columns) in R. Is it possible to do this and add a row of the SD values? These are probe beta-values for DNA methylation.


Solution

  • Using mtcars:

    mt <- mtcars[1:5,]
    rbind(mt, "Standard Deviation" = c(NA, sapply(mt[,-1], sd)))
    #                    mpg cyl disp  hp drat   wt qsec   vs   am gear carb
    # Mazda RX4           21 6.0  160 110 3.90 2.62 16.5 0.00 1.00 4.00  4.0
    # Mazda RX4 Wag       21 6.0  160 110 3.90 2.88 17.0 0.00 1.00 4.00  4.0
    # Datsun 710          23 4.0  108  93 3.85 2.32 18.6 1.00 1.00 4.00  1.0
    # Hornet 4 Drive      21 6.0  258 110 3.08 3.21 19.4 1.00 0.00 3.00  1.0
    # Hornet Sportabout   19 8.0  360 175 3.15 3.44 17.0 0.00 0.00 3.00  2.0
    # Standard Deviation  NA 1.4  100  32 0.42 0.45  1.3 0.55 0.55 0.55  1.5
    

    Explanation:

    • sapply(mt[,-1], sd) execute the sd function on each column except the first mt[,-1]. Because this is sapply and the return values are all the same "shape" and class, this returns a vector.
    • c(NA, ..) is because we don't have a standard-dev for the first column, but since we are row-binding it we need the length to be the same as the number of columns in mt.
    • rbind(mt, ..) is row-binding aka row-concatenation. Because in this case mtcars uses row-names to identify, I named the stddev as "Standard Deviation". This may not be necessary or appropriate with your data.