I have a 160 by 250,000 data frame. I want to find the SD of all columns except the first (249,999 columns) in R. Is it possible to do this and add a row of the SD values? These are probe beta-values for DNA methylation.
Using mtcars
:
mt <- mtcars[1:5,]
rbind(mt, "Standard Deviation" = c(NA, sapply(mt[,-1], sd)))
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21 6.0 160 110 3.90 2.62 16.5 0.00 1.00 4.00 4.0
# Mazda RX4 Wag 21 6.0 160 110 3.90 2.88 17.0 0.00 1.00 4.00 4.0
# Datsun 710 23 4.0 108 93 3.85 2.32 18.6 1.00 1.00 4.00 1.0
# Hornet 4 Drive 21 6.0 258 110 3.08 3.21 19.4 1.00 0.00 3.00 1.0
# Hornet Sportabout 19 8.0 360 175 3.15 3.44 17.0 0.00 0.00 3.00 2.0
# Standard Deviation NA 1.4 100 32 0.42 0.45 1.3 0.55 0.55 0.55 1.5
Explanation:
sapply(mt[,-1], sd)
execute the sd
function on each column except the first mt[,-1]
. Because this is sapply
and the return values are all the same "shape" and class, this returns a vector.c(NA, ..)
is because we don't have a standard-dev for the first column, but since we are row-binding it we need the length to be the same as the number of columns in mt
.rbind(mt, ..)
is row-binding aka row-concatenation. Because in this case mtcars
uses row-names to identify, I named the stddev as "Standard Deviation"
. This may not be necessary or appropriate with your data.