I have a data frame with 13 columns. the 13th column shows the group number each row belongs to. I want to take the standard deviation of all values in all rows of columns 3 to 12, for rows of each group.
for(i in 1: groupnumber) {
sd.vect[i] <- sd(as.vector(df[df$group==i,][,-c(1,2,13)]))}
I get error
Error in is.data.frame(x) : 'list' object cannot be coerced to type 'double'
how can I get sd of all values in each group?
You can use dplyr. You can group_by(grouping_column)
, then summarise()
with the sd()
function:
library(dplyr)
data_frame%>%>group_by(thirteenth_column)%>%summarise(across(-c(1,2), sd))
As an example with mtcars:
library(dplyr)
mtcars%>%group_by(cyl)%>%summarise(across(everything(), sd))
# A tibble: 3 x 11
cyl mpg disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 4.51 26.9 20.9 0.365 0.570 1.68 0.302 0.467 0.539 0.522
2 6 1.45 41.6 24.3 0.476 0.356 1.71 0.535 0.535 0.690 1.81
3 8 2.56 67.8 51.0 0.372 0.759 1.20 0 0.363 0.726 1.56
That said, with some corrections, your for loop works: You just have to make sure to return a dataframe, as your desired outcome is actually a two-dimension object. See an example with the iris dataset:
sd.df<-data.frame()
for(i in unique(iris$Species)) {
for (j in 1:ncol(iris[-5])){
sd.df[i,j] <- sd(iris[iris$Species==i,j])
}
}
sd.df
V1 V2 V3 V4
setosa 0.3524897 0.3790644 0.1736640 0.1053856
versicolor 0.5161711 0.3137983 0.4699110 0.1977527
virginica 0.6358796 0.3224966 0.5518947 0.2746501
update
I understand from your comments you may want something quite odd, which would be to group your data by the grouping variable, than get the standard deviation for an aggregate of all values in each sub-dataframe. You probably would be better off working with matrices here.
For that, you may neet to group your data by unique()
values in the grouping column, then call sd()
on the rest of the dataframe (all values), which can be done if you coerce the dataframe into a matrix:
library(dplyr)
library(purrr)
map_dbl(unique(mtcars$cyl), ~as.matrix(mtcars%>%
filter(cyl==.x)%>%
select(-cyl))%>%
sd())%>%
set_names(., unique(mtcars$cyl))
6 4 8
62.47655 37.54494 118.18945
With your data:
map(unique(df[[13]]), ~as.matrix(df%>%
filter(df[[13]]==.x)%>%
select(-c(1,2,13)))%>%
sd()%>%
set_names(., unique(df[[13]]))
And a much simpler answer with base subsetting and split()
:
map_dbl(split(mtcars[-c(1,2, 10)], mtcars[10]), ~sd(as.matrix(.x)))
3 4 5
119.47824 47.97490 98.71733