Search code examples
rdplyrgroup-bysummarize

dplyr groups not working with dollar sign data$column syntax


I'm looking to find the min and max values of a column for each group:

mtcars %>%
  group_by(mtcars$cyl) %>%
  summarize(
    min_mpg = min(mtcars$mpg),
    max_mpg = max(mtcars$mpg)
  )
# # A tibble: 3 x 3
#   `mtcars$cyl` min_mpg max_mpg
#          <dbl>   <dbl>   <dbl>
# 1            4    10.4    33.9
# 2            6    10.4    33.9
# 3            8    10.4    33.9

It works for the most part and the format of the dataset looks good. However, it gives the min and max of the entire dataset, not of each individual group.


Solution

  • Don't use $ inside dplyr functions, they expect unquoted column names.

    mtcars$mpg is specifically referencing the whole column form the original input data frame, not the grouped the grouped tibble coming out of group_by. Change your code to remove the data$ and it will work:

    mtcars %>%
      group_by(cyl) %>%
      summarize(
        min_mpg = min(mpg),
        max_mpg = max(mpg)
      )
    # # A tibble: 3 x 3
    #     cyl min_mpg max_mpg
    #   <dbl>   <dbl>   <dbl>
    # 1     4    21.4    33.9
    # 2     6    17.8    21.4
    # 3     8    10.4    19.2
    

    (Not to mention it's a lot less typing!)