Search code examples
rstandard-deviation

Compute population sd for grouped variables


This is my tibble:

df <- tibble(x = c("a", "a", "a", "b", "b", "b"), y = c(1,2,3,4,6,8))
df
# A tibble: 6 x 2
  x         y
  <chr> <dbl>
1 a         1
2 a         2
3 a         3
4 b         4
5 b         6
6 b         8

I want to compute the poulation sd for the grouped variables of x.

I tried it with this function:

sqrt((n-1)/n) * sd(x)

and dplyr and it looked like this:

df %>%
  group_by(x) %>%
  summarise(sd = sqrt((length(df$y)-1)/length(df$y)) * sd(y)) %>%
  ungroup()

# A tibble: 2 x 2
  x        sd
* <chr> <dbl>
1 a     0.913
2 b     1.83 

Ofcourse this is incorrect, since the length argument is not grouped and therefore takes n = 6 and not n = 3. I should get

a = 0.8164966
b = 1.632993

Edit:

The output should be a tibble with the variables i have grouped and the sd for every group.


Solution

  • You can use the n() function

    df %>%
        group_by(x) %>%
        summarise(sd = sqrt(( n() -1)/ n() ) * sd(y)) %>%
        ungroup()