Search code examples
rdplyrqwraps2

summary_table in qwraps2 with group_by in R


I am trying out the qwraps2 package and some of its functions. In particular I am interested in the summary_table tool for output. I am using the iris data set for practice, but I noticed something strange when using group_by in the summary_table:

library(datasets)
data("iris")
options(qwraps2_markup = "markdown")
our_summary1 <-
  list("Sepal Length" =
       list("min" = ~ min(iris$Sepal.Length),
            "max" = ~ max(iris$Sepal.Length),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Length)),
       "Sepal Width" =
       list("min" = ~ min(iris$Sepal.Width),
            "median" = ~ median(iris$Sepal.Width),
            "max" = ~ max(iris$Sepal.Width),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Width)),
       "Petal Length" =
       list("min" = ~ min(iris$Petal.Length),
            "max" = ~ max(iris$Petal.Length),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Length)),
       "Petal Width" =
       list("min" = ~ min(iris$Petal.Width),
            "max" = ~ max(iris$Petal.Width),
            "mean (sd)" = ~ qwraps2::mean_sd(iris$Petal.Width)),
        "Species" =
       list("Setosa" = ~ qwraps2::n_perc0(iris$Species == "setosa"),
            "Versicolor"  = ~ qwraps2::n_perc0(iris$Species == "versicolor"),
            "Virginica"  = ~ qwraps2::n_perc0(iris$Species == "virginica"))
       )

bytype <- qwraps2::summary_table(dplyr::group_by(iris,Species),our_summary1)
bytype

The output i get is: output from the above code

This doesnt make sense, it says that the statistics on different variables across different flower species are the same, which they are not. I cross checked this by doing:

aggregate(iris[1:4], list(iris$Species), mean)

which shows that for example the mean of the different variables varies across species.

Why is dplyr::group_by not doing what it should?

i posted the output as best I could, sorry and thank you for the comprehension.


Solution

  • The reason the group_by call does not appear to do anything is because the data pronoun .data is not being used in the summary definition. As written, the summary table is constructed based on the whole iris data set, regardless of any grouping or subsetting. The .data pronoun is needed so that the the tidyverse tools behind summary_table use the correct scoping.

    library(datasets)
    library(qwraps2)
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    
    data("iris")
    options(qwraps2_markup = "markdown")
    
    our_summary1 <-
      list("Sepal Length" =
           list("min" = ~ min(.data$Sepal.Length),
                "max" = ~ max(.data$Sepal.Length),
                "mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Length)),
           "Sepal Width" =
           list("min" = ~ min(.data$Sepal.Width),
                "median" = ~ median(.data$Sepal.Width),
                "max" = ~ max(.data$Sepal.Width),
                "mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Width)),
           "Petal Length" =
           list("min" = ~ min(.data$Petal.Length),
                "max" = ~ max(.data$Petal.Length),
                "mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Length)),
           "Petal Width" =
           list("min" = ~ min(.data$Petal.Width),
                "max" = ~ max(.data$Petal.Width),
                "mean (sd)" = ~ qwraps2::mean_sd(.data$Petal.Width)),
            "Species" =
           list("Setosa" = ~ qwraps2::n_perc0(.data$Species == "setosa"),
                "Versicolor"  = ~ qwraps2::n_perc0(.data$Species == "versicolor"),
                "Virginica"  = ~ qwraps2::n_perc0(.data$Species == "virginica"))
           )
    
    
    bytype <- qwraps2::summary_table(dplyr::group_by(iris,Species),our_summary1)
    bytype
    #> 
    #> 
    #> |                        |Species: setosa (N = 50) |Species: versicolor (N = 50) |Species: virginica (N = 50) |
    #> |:-----------------------|:------------------------|:----------------------------|:---------------------------|
    #> |**Sepal Length**        |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
    #> |&nbsp;&nbsp; min        |4.3                      |4.9                          |4.9                         |
    #> |&nbsp;&nbsp; max        |5.8                      |7.0                          |7.9                         |
    #> |&nbsp;&nbsp; mean (sd)  |5.01 &plusmn; 0.35       |5.94 &plusmn; 0.52           |6.59 &plusmn; 0.64          |
    #> |**Sepal Width**         |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
    #> |&nbsp;&nbsp; min        |2.3                      |2.0                          |2.2                         |
    #> |&nbsp;&nbsp; median     |3.4                      |2.8                          |3.0                         |
    #> |&nbsp;&nbsp; max        |4.4                      |3.4                          |3.8                         |
    #> |&nbsp;&nbsp; mean (sd)  |3.43 &plusmn; 0.38       |2.77 &plusmn; 0.31           |2.97 &plusmn; 0.32          |
    #> |**Petal Length**        |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
    #> |&nbsp;&nbsp; min        |1.0                      |3.0                          |4.5                         |
    #> |&nbsp;&nbsp; max        |1.9                      |5.1                          |6.9                         |
    #> |&nbsp;&nbsp; mean (sd)  |5.01 &plusmn; 0.35       |5.94 &plusmn; 0.52           |6.59 &plusmn; 0.64          |
    #> |**Petal Width**         |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
    #> |&nbsp;&nbsp; min        |0.1                      |1.0                          |1.4                         |
    #> |&nbsp;&nbsp; max        |0.6                      |1.8                          |2.5                         |
    #> |&nbsp;&nbsp; mean (sd)  |0.25 &plusmn; 0.11       |1.33 &plusmn; 0.20           |2.03 &plusmn; 0.27          |
    #> |**Species**             |&nbsp;&nbsp;             |&nbsp;&nbsp;                 |&nbsp;&nbsp;                |
    #> |&nbsp;&nbsp; Setosa     |50 (100)                 |0 (0)                        |0 (0)                       |
    #> |&nbsp;&nbsp; Versicolor |0 (0)                    |50 (100)                     |0 (0)                       |
    #> |&nbsp;&nbsp; Virginica  |0 (0)                    |0 (0)                        |50 (100)                    |
    

    Created on 2020-03-01 by the reprex package (v0.3.0)

    enter image description here