Search code examples
rtestingdplyrmultiple-columnsdivision

Same operation (division) by groups for all columns in R


I have a dataset like Christmas:

Christmas <- data_frame(month = c("1", "1", "2", "2"), 
                 NP = c(2, 3, 3, 1),
                 ND = c(4, 2, 0, 6),
                 NO = c(1, 5, 2, 4),
                 variable = c("mean", "sd", "mean", "sd"))

and I want to calculate the t-statistic of each column, by month. The formula for the t-statistic I want to use is t-statistic = mean/sd. (Note: I want to calculate this for all (in this case, they are only NP,ND, and NO) the columns).

The new dataset will look like t_statistics:

t_statistic <- data_frame(
  month = c("1", "2"), 
  NP = c(2/3, 3),
  ND = c(4/2, 0),
  NO = c(1/5, 2/4)
  )

Any clue?


Solution

  • If we already have the mean/sd values created, then it is just first element divided by last (as there was only two rows per group)

    library(dplyr)
    out <- Christmas %>% 
        group_by(month) %>% 
        summarise(across(NP:NO,  ~first(.)/last(.)))
    

    -output

    out
    # A tibble: 2 × 4
      month    NP    ND    NO
      <chr> <dbl> <dbl> <dbl>
    1 1     0.667     2   0.2
    2 2     3         0   0.5
    

    -checking with OP's output

    > identical(t_statistic, out)
    [1] TRUE
    

    Or if the mean/sd are not ordered

    Christmas %>%
       arrange(month, variable) %>%
       group_by(month) %>%
       summarise(across(NP:NO,  ~first(.)/last(.)))