Using R, I want to create a table with nested columns (and possibly nested rows). The cell values should include frequencies and within-sub-group totals and percentages (i.e., cell/[sub-group row total]*100).
I would prefer that the solution uses the tables package. I will be outputting to LaTeX.
Error in Percent("row") : Summary fn not allowed with Percent
library(tables)
set.seed(123)
df <- data.frame(exposure = sample(LETTERS[1:5], 100, TRUE),
Group = sample(c("GroupX","GroupY"), 100, TRUE),
disease = as.integer(sample(c(0,1), 100, TRUE)))
num <- function(x) base::sum(x, na.rm=TRUE)
tabular(Factor(exposure)+1~
Factor(Group)*
(Heading()*num*Heading(One)*disease*
((Total=1)+Percent("row"))),
data=df)
Group
GroupX GroupY
num num
exposure Total disease Total disease
A 9 4 13 6
B 12 4 9 5
C 9 8 9 6
D 7 1 8 3
E 9 4 15 12
All 46 21 54 32
Here is a start:
tabular(Factor(exposure) + 1 ~
Factor(Group) *
((Total = 1) + num * disease), data = df)
Percent()
. Why would one use a logical vector with Percent()
. Can you give an example? Would using a logical vector help me with this problem?This is similar to this question; however, the offered answer calculates incorrect percentages as evidenced by an example with more than 2 columns.
As of version 0.7.72, the tables
package can calculate sub-group percentages. Credit for the commits and this answer goes to Duncan Murdoch, the package maintainer.
The updated source package is available by SVN from rForge. General installation instructions for installing source packages can be found here. Refer to answer 2 below for usage. Binary packages might be available by the time you read this.
1) The tables
package will only calculate one thing per column. Percent
is effectively a "summary function" and by defining a new summary function, num
, I've asked it to calculate num in all of the columns, and also to calculate Percent
in some of them. If I use my num
function, I need to move it within the parentheses so it doesn't end up "multiplied" (in the tables
grammar sense) by Percent
. The following code will generate a count of persons with disease (i.e., disease==1) and it will generate a row percentage (Group Total/Row Total * 100), not the desired cell/(sub-group row total). With versions of tables
< 0.7.72, that's as far as we can get.
library(tables) ## prior to 0.7.72
df <- data.frame(exposure = sample(LETTERS[1:5], 100, TRUE),
Group = sample(c("GroupX","GroupY"), 100, TRUE),
disease = as.integer(sample(c(0,1), 100, TRUE)))
num <- function(x) base::sum(x, na.rm=TRUE)
tabular(Factor(exposure)+1~
Factor(Group)*
(Heading("Group Total")*(1)+num*disease+Percent("row")),
data=df)
2) Version 0.7.72 of the tables
package will calculate the desired sub-group percentages. It introduces a pseudo-function called Equal()
.
set.seed(100)
library(tables)
df <- data.frame(exposure = sample(LETTERS[1:5], 100, TRUE),
Group = sample(c("GroupX","GroupY"), 100, TRUE),
disease = as.integer(sample(c(0,1), 100, TRUE)))
myTable <- tabular(Factor(exposure)+1~
Factor(Group)*
(Heading("Group Total")*(1)+Factor(disease)*((n=1)+Heading("%")*Percent(Equal(exposure,Group)))),
data=df)
myTable
myTable generates the following output:
Group
GroupX GroupY
disease disease
0 1 0 1
exposure Group Total n % n % Group Total n % n %
A 5 1 20.00 4 80.00 6 3 50.00 3 50.00
B 17 12 70.59 5 29.41 10 3 30.00 7 70.00
C 13 4 30.77 9 69.23 10 6 60.00 4 40.00
D 8 2 25.00 6 75.00 13 7 53.85 6 46.15
E 7 3 42.86 4 57.14 11 8 72.73 3 27.27
All 50 22 44.00 28 56.00 50 27 54.00 23 46.00
From Duncan's explanation,
"The general way to read the code above is 'show the percentage of the values in the current cell relative to the values in all cells with equal x and y.'
The x and y are now taken as expressions; it effectively looks through the formula for places where subsetting occurs, and ignores subsetting by other variables."
The last step is to subset the table, like a matrix, to keep only the desired columns (and/or rows), as demonstrated by the last example in the help file of tabular()
:
myTable[,c(1,4,5,6,9,10)]
This gives the final result:
Group
GroupX GroupY
disease disease
1 1
exposure Group Total n % Group Total n %
A 5 4 80.00 6 3 50.00
B 17 5 29.41 10 7 70.00
C 13 9 69.23 10 4 40.00
D 8 6 75.00 13 6 46.15
E 7 4 57.14 11 3 27.27
All 50 28 56.00 50 23 46.00