Search code examples
rdplyrpipetidyversedata-cleaning

Using dplyr to group the new calculations into one data frame


I have the following table and I have to obtain a standard deviation of y for each unique value of x.

  ID    x   y

   1    1   4   
   2    2   3   
   3    3   7   
   4    1   2   
   5    2   6   
   6    3   8   

For example, each unique value of x, I have y=4 and y=2, so the standard deviation will be:

x1 <- c(4,2)
sd(x1) 
#output is 1.41

x2 <-c(3,6)
sd(x2)
#output is 2.21

x3 <-c(3,6)
sd(x3)
#output is 0.71

Instead of getting each output and put it in a data frame using the long way, is there a way to do it faster using dplyr and the pipe? I tried to use mutate and group_by, but it doesn't seem to work. I would like the result to look the following with count_y (# of y values to each unique x)

x   count_y  Std_Dev
    
1   2        1.41
2   2        2.21
3   2        0.71

Solution

  • We don't need mutate (mutate creates or transforms column). Here, the output needed is one row per group which can be done with summarise

    library(dplyr)
    df1 %>% 
       group_by(x) %>%
       summarise(count_y = n(), Std_Dev = sd(y))
    

    -output

    # A tibble: 3 × 3
          x count_y Std_Dev
      <int>   <int>   <dbl>
    1     1       2   1.41 
    2     2       2   2.12 
    3     3       2   0.707
    

    data

    df1 <- structure(list(ID = 1:6, x = c(1L, 2L, 3L, 1L, 2L, 3L), y = c(4L, 
    3L, 7L, 2L, 6L, 8L)), class = "data.frame", row.names = c(NA, 
    -6L))