I am trying out the qwraps2 package and some of its functions. In particular I am interested in the summary_table tool for output. I am using the iris data set for practice, but I noticed something strange when using group_by in the summary_table:
library(datasets)
data("iris")
options(qwraps2_markup = "markdown")
our_summary1 <-
list("Sepal Length" =
list("min" = ~ min(iris$Sepal.Length),
"max" = ~ max(iris$Sepal.Length),
"mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Length)),
"Sepal Width" =
list("min" = ~ min(iris$Sepal.Width),
"median" = ~ median(iris$Sepal.Width),
"max" = ~ max(iris$Sepal.Width),
"mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Width)),
"Petal Length" =
list("min" = ~ min(iris$Petal.Length),
"max" = ~ max(iris$Petal.Length),
"mean (sd)" = ~ qwraps2::mean_sd(iris$Sepal.Length)),
"Petal Width" =
list("min" = ~ min(iris$Petal.Width),
"max" = ~ max(iris$Petal.Width),
"mean (sd)" = ~ qwraps2::mean_sd(iris$Petal.Width)),
"Species" =
list("Setosa" = ~ qwraps2::n_perc0(iris$Species == "setosa"),
"Versicolor" = ~ qwraps2::n_perc0(iris$Species == "versicolor"),
"Virginica" = ~ qwraps2::n_perc0(iris$Species == "virginica"))
)
bytype <- qwraps2::summary_table(dplyr::group_by(iris,Species),our_summary1)
bytype
The output i get is: output from the above code
This doesnt make sense, it says that the statistics on different variables across different flower species are the same, which they are not. I cross checked this by doing:
aggregate(iris[1:4], list(iris$Species), mean)
which shows that for example the mean of the different variables varies across species.
Why is dplyr::group_by
not doing what it should?
i posted the output as best I could, sorry and thank you for the comprehension.
The reason the group_by
call does not appear to do anything is because the
data pronoun .data
is not being used in the summary definition. As
written, the summary table is constructed based on the whole iris
data set,
regardless of any grouping or subsetting. The .data
pronoun is needed so
that the the tidyverse
tools behind summary_table
use the correct
scoping.
library(datasets)
library(qwraps2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data("iris")
options(qwraps2_markup = "markdown")
our_summary1 <-
list("Sepal Length" =
list("min" = ~ min(.data$Sepal.Length),
"max" = ~ max(.data$Sepal.Length),
"mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Length)),
"Sepal Width" =
list("min" = ~ min(.data$Sepal.Width),
"median" = ~ median(.data$Sepal.Width),
"max" = ~ max(.data$Sepal.Width),
"mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Width)),
"Petal Length" =
list("min" = ~ min(.data$Petal.Length),
"max" = ~ max(.data$Petal.Length),
"mean (sd)" = ~ qwraps2::mean_sd(.data$Sepal.Length)),
"Petal Width" =
list("min" = ~ min(.data$Petal.Width),
"max" = ~ max(.data$Petal.Width),
"mean (sd)" = ~ qwraps2::mean_sd(.data$Petal.Width)),
"Species" =
list("Setosa" = ~ qwraps2::n_perc0(.data$Species == "setosa"),
"Versicolor" = ~ qwraps2::n_perc0(.data$Species == "versicolor"),
"Virginica" = ~ qwraps2::n_perc0(.data$Species == "virginica"))
)
bytype <- qwraps2::summary_table(dplyr::group_by(iris,Species),our_summary1)
bytype
#>
#>
#> | |Species: setosa (N = 50) |Species: versicolor (N = 50) |Species: virginica (N = 50) |
#> |:-----------------------|:------------------------|:----------------------------|:---------------------------|
#> |**Sepal Length** | | | |
#> | min |4.3 |4.9 |4.9 |
#> | max |5.8 |7.0 |7.9 |
#> | mean (sd) |5.01 ± 0.35 |5.94 ± 0.52 |6.59 ± 0.64 |
#> |**Sepal Width** | | | |
#> | min |2.3 |2.0 |2.2 |
#> | median |3.4 |2.8 |3.0 |
#> | max |4.4 |3.4 |3.8 |
#> | mean (sd) |3.43 ± 0.38 |2.77 ± 0.31 |2.97 ± 0.32 |
#> |**Petal Length** | | | |
#> | min |1.0 |3.0 |4.5 |
#> | max |1.9 |5.1 |6.9 |
#> | mean (sd) |5.01 ± 0.35 |5.94 ± 0.52 |6.59 ± 0.64 |
#> |**Petal Width** | | | |
#> | min |0.1 |1.0 |1.4 |
#> | max |0.6 |1.8 |2.5 |
#> | mean (sd) |0.25 ± 0.11 |1.33 ± 0.20 |2.03 ± 0.27 |
#> |**Species** | | | |
#> | Setosa |50 (100) |0 (0) |0 (0) |
#> | Versicolor |0 (0) |50 (100) |0 (0) |
#> | Virginica |0 (0) |0 (0) |50 (100) |
Created on 2020-03-01 by the reprex package (v0.3.0)