Search code examples
rdplyrdigitspander

dplyr::group_by appears to drop pander options in r


I am not sure if this is a bug or something I am doing wrong, but I seem to lose my panderOptions('round', 2) when I use the dplyr group_by command on a dataframe prior to feeding it into pander. I'm working in an Rstudio Notebook. Example below:

Load libraries and set options:

library(pander)
panderOptions('round', 2)
panderOptions('keep.trailing.zeros', TRUE)
panderOptions('table.split.table', Inf)

library(tidyverse)

Create some data:

set.seed(10)
df <- data.frame(x = rnorm(10), y = rnorm(10), class = c("a", "b"))

print(df)

 x   y   class
0.01874617  1.10177950  a       
-0.18425254 0.75578151  b       
-1.37133055 -0.23823356 a       
-0.59916772 0.98744470  b       
0.29454513  0.74139013  a       
0.38979430  0.08934727  b       
-1.20807618 -0.95494386 a       
-0.36367602 -0.19515038 b       
-1.62667268 0.92552126  a       
-0.25647839 0.48297852  b

Manipulate df and use pander without any group_by() operation:

# make a table and output data
df_nogroup <- df %>%
mutate(xy = x * y) %>%
summarise(mean = mean(xy, na.rm = TRUE),
        sd = sd(xy, na.rm = TRUE),
        se = sd(xy, na.rm = TRUE)/sqrt(n()),
        CI95_upr = mean + (qnorm(0.975) * se),
        CI95_lwr = mean - (qnorm(0.975) * se),
        n = n())

pander(df_nogroup, "No grouping step. Round working")

------------------------------------------
mean   sd   se   CI95_upr   CI95_lwr   n 
------ ---- ---- ---------- ---------- ---
-0.05  0.68 0.21    0.37      -0.47    10 
------------------------------------------

Table: No grouping step. Round working

Now with group_by():

df_group <- df %>%
mutate(xy = x * y) %>%
group_by(class) %>%
summarise(mean = mean(xy, na.rm = TRUE),
        sd = sd(xy, na.rm = TRUE),
        se = sd(xy, na.rm = TRUE)/sqrt(n()),
        CI95_upr = mean + (qnorm(0.975) * se), 
        CI95_lwr = mean - (qnorm(0.975) * se),
        n = n()) 

pander(df_group, "Grouping appears to be the culprit")


 --------------------------------------------------------
 class    mean     sd     se    CI95_upr   CI95_lwr   n 
 ------- -------- ------ ------ ---------- ---------- ---
   a    0.04277  0.9674 0.4326  0.89069    -0.8051    5 

   b    -0.14979 0.2640 0.1181  0.08163    -0.3812    5 
 --------------------------------------------------------

 Table: Grouping appears to be the culprit

My sessionInfo():

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United  States.1252    LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United  States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pander_0.6.0     lsmeans_2.25     estimability_1.2 lme4_1.1-12        Matrix_1.2-7.1 lubridate_1.6.0 [7] dplyr_0.5.0 purrr_0.2.2      readr_1.0.0      tidyr_0.6.1      tibble_1.2 ggplot2_2.2.1 [13] tidyverse_1.1.0  knitr_1.15.1 

SOLUTION After updating to dev pander 0.6.0, the problem is fixed.


Solution

  • I run into this a lot, you have to set the options inline rather than using panderOptions()

    library(dplyr)
    library(pander)
    
    set.seed(10)
    df <- data.frame(x = rnorm(10), y = rnorm(10), class = c("a", "b"))
    
    
    df_group <- df %>%
      mutate(xy = x * y) %>%
      group_by(class) %>%
      summarise(mean = mean(xy, na.rm = TRUE),
                sd = sd(xy, na.rm = TRUE),
                se = sd(xy, na.rm = TRUE)/sqrt(n()),
                CI95_upr = mean + (qnorm(0.975) * se), 
                CI95_lwr = mean - (qnorm(0.975) * se),
                n = n()) 
    
    
    pander(df_group, "Setting inline options fixes this", round = 2)
    
    -------------------------------------------------------
     class   mean     sd     se    CI95_upr   CI95_lwr   n 
    ------- ------- ------ ------ ---------- ---------- ---
       a     0.04    0.97   0.43     0.89      -0.81     5 
    
       b     -0.15   0.26   0.12     0.08      -0.38     5 
    -------------------------------------------------------
    
    Table: Setting inline options fixes this
    

    Session info for comparison. I am using a development version of dplyr.

    > sessionInfo()
    R version 3.3.2 (2016-10-31)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X El Capitan 10.11.6
    
    locale:
    [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] bindrcpp_0.1     pander_0.6.0     dplyr_0.5.0.9000
    
    loaded via a namespace (and not attached):
     [1] lazyeval_0.2.0 magrittr_1.5   R6_2.2.0       assertthat_0.1 DBI_0.5-1      tools_3.3.2    tibble_1.2     Rcpp_0.12.9    digest_0.6.11  bindr_0.1