Search code examples
rgroup-bysummarize

groupwise summarise is returning NA; also in minor example from the web


I try to generate yearwise summary statistics as follows:

data %>%
  group_by(year) %>%
    summarise(mean.abc = mean(abc), mean.def = mean(def), sd.abc = sd(abc), sd.def = sd(def))

This code returns a row vector filled with NA in the respective columns

  mean.abc mean.def sd.abc sd.def
1       NA       NA     NA     NA

So, I tried to work this out and replicated some examples

data(mtcars)

mtcars %>%
  group_by(cyl) %>%
  summarise(mean = mean(disp))

And this script returns

      mean
1 230.7219

So, what am I doing wrong? I am loading the following packages:

loadpackage( c("foreign","haven", "tidyverse", "plyr", "stringr", "eeptools", "factoextra") )

Thanky for your support!


Solution

  • Your issue is that the summarise-function from the plyr-package does not do what you expect it to do.

    See the difference between:

    library(tidyverse)
    
    mtcars %>%
      group_by(cyl) %>%
      plyr::summarise(mean = mean(disp))
    #>       mean
    #> 1 230.7219
    

    and

    mtcars %>%
      group_by(cyl) %>%
      dplyr::summarise(mean = mean(disp))
    #> # A tibble: 3 x 2
    #>     cyl  mean
    #>   <dbl> <dbl>
    #> 1     4  105.
    #> 2     6  183.
    #> 3     8  353.
    

    Since your data seems to have missing values, this should do the trick:

       data %>% 
        group_by(year) %>% 
        dplyr::summarise(across(all_of(c('abc', 'def')),
                                .fns = list(mean = ~mean(.,na.rm=T),
                                            sd = ~sd(.,na.rm=T))))