Search code examples
rdplyrtidyverse

Summarising same column twice with dplyr return NA


Consider the following minimal working example in R:

library(tidyverse)

df <- data_frame(
  colour=c('red', 'red', 'blue', 'blue'),
  value=c(1, 1, 2, 2)
)

df %>%
  group_by(colour) %>%
  summarise(
    value=mean(value),
    value.sd=sd(value),
  )

The output is

# A tibble: 2 × 3
  colour value value.sd
  <chr>  <dbl>    <dbl>
1 blue       2       NA
2 red        1       NA

when the expected output is

# A tibble: 2 × 3
  colour      value value.sd
  <chr>       <dbl>    <dbl>
1 blue            2        0
2 red             1        0

I know how to work around the issue. As the following code will provide the expected output:

df %>%
  group_by(colour) %>%
  summarise(
    value.mean=mean(value),
    value.sd=sd(value),
  )

My question is: am I using R/dplyr wrongly in the first code sample or this a bug in dplyr?


Solution

  • When I ran you code I got a warning that data_frame was deprecated.

    This works

    df <- tibble(
        colour=c('red', 'red', 'blue', 'blue'),
        value=c(1, 1, 2, 2)
    )
    
    df %>%
        group_by(colour) %>%
        summarise(
            value.mean = mean(value),
            value.sd=sd(value)
        )
    
    # A tibble: 2 × 3
      colour value.mean value.sd
      <chr>       <dbl>    <dbl>
    1 blue            2        0
    2 red             1        0
    

    So I would suggest trying that because maybe there was a bug that was fixed.