The goal is NOT cross tabulation. I am aiming for a table with descriptive statistics (in this case, counts & percentages) for all the variables in a dataset so that I do not need to type table(name of variable) approx. 20 times while working on a new dataset. I plan on just exporting this using xtable + knitr. Unfortunately, there is something wrong with the loop. Any help would be appreciated. This question is very relevant to people who repeatedly need to present summary statistics as this would save a lot of time. Sorry for asking this a 3rd time :) I have heard of the table() function - in fact, I've used it several times below In a nutshell, this function presents the names of variables, their levels, the counts & percentages without making someone type in table() for each variable in a dataset-
ESRD <- rep(c("Y", "N"), each=10)
DIABETES <- rep(c("Y", "N", "Y", "N"), c(5, 5, 5, 5))
BLAH <- rep(c("Y", "N"), each=10)
categoricalvariables <- data.frame(ESRD, DIABETES, BLAH)
descriptives <- function(VARIABLEMATRIX){
desc <- matrix(0, ncol=4, nrow=2*ncol(VARIABLEMATRIX) + ncol(VARIABLEMATRIX))
for (i in 1:ncol(VARIABLEMATRIX)){
matper <- matrix(0, nrow=dim(table(VARIABLEMATRIX[ ,i])), ncol=1)
for (i in 1:dim(table(VARIABLEMATRIX[ ,i]))){
matper[i, ] <- paste(round(prop.table(table(VARIABLEMATRIX[ ,i]))[i]*100, 2), "%")
}
matcount <- matrix(0, nrow=dim(table(VARIABLEMATRIX[ ,i])), ncol=1)
for (i in 1:dim(table(VARIABLEMATRIX[ ,i]))){
matcount[i, ] <- table(VARIABLEMATRIX[ ,i])[i]
}
desc[((3*i)-2), ] <- c(colnames(VARIABLEMATRIX)[i], "", "", "")
desc[((3*i)-1):(3*i), ] <- cbind("", names(table(VARIABLEMATRIX[ ,i])), matcount[ ,1], matper[ ,1])
return(desc)
}
}
descriptives(categoricalvariables)
The output I am getting is (clearly there is a bug but I am not sure what is wrong):
[,1] [,2] [,3] [,4]
[1,] "0" "0" "0" "0"
[2,] "0" "0" "0" "0"
[3,] "0" "0" "0" "0"
[4,] "DIABETES" "" "" ""
[5,] "" "N" "10" "50 %"
[6,] "" "Y" "10" "50 %"
[7,] "0" "0" "0" "0"
[8,] "0" "0" "0" "0"
[9,] "0" "0" "0" "0"
The expected output should be:
[,1] [,2] [,3] [,4]
[1,] "ESRD" "" "" ""
[2,] "" "N" "10" "50 %"
[3,] "" "Y" "10" "50 %"
[4,] "DIABETES" "" "" ""
[5,] "" "N" "10" "50 %"
[6,] "" "Y" "10" "50 %"
[7,] "BLAH" "" "" ""
[8,] "" "N" "10" "50 %"
[9,] "" "Y" "10" "50 %"
Here are some options using tidyverse
functions:
library(tidyverse)
categoricalvariables %>%
gather(Measure, Value) %>%
group_by(Measure, Value) %>%
tally %>%
mutate(Percent=n/sum(n))
Measure Value n Percent 1 BLAH N 10 0.5 2 BLAH Y 10 0.5 3 DIABETES N 10 0.5 4 DIABETES Y 10 0.5 5 ESRD N 10 0.5 6 ESRD Y 10 0.5
categoricalvariables %>%
gather(Measure, Value) %>%
group_by(Measure, Value) %>%
tally %>%
mutate(Percent=n/sum(n)) %>%
gather(Stats, Value2, -Measure, -Value) %>%
unite(Value_Stats, Stats, Value) %>%
spread(Value_Stats, Value2)
Measure n_N n_Y Percent_N Percent_Y 1 BLAH 10 10 0.5 0.5 2 DIABETES 10 10 0.5 0.5 3 ESRD 10 10 0.5 0.5
I've left the data in numeric format in case you want to do further processing.
To set things up for an exported table, maybe something like this:
tab = categoricalvariables %>%
gather(Measure, Value) %>%
group_by(Measure, Value) %>%
summarise(Count=n()) %>%
mutate(Percent=paste0(sprintf("%1.1f", Count/sum(Count)*100),"%")) %>%
ungroup %>%
mutate(Measure = ifelse(duplicated(Measure),"", Measure))
Measure Value Count Percent 1 BLAH N 10 50.0% 2 Y 10 50.0% 3 DIABETES N 10 50.0% 4 Y 10 50.0% 5 ESRD N 10 50.0% 6 Y 10 50.0%
Now you can run xtable
on tab
.
library(xtable)
print(xtable(tab, align="llcrr"), include.rownames=FALSE)
Which looks like this when output from an rmarkdown
document to PDF:
If you have numeric columns that you want to summarise, you can do, for example (using the built-in iris
data frame):
iris %>% group_by(Species) %>%
summarise_all(funs(mean, min, max)) %>%
gather(key, value, -Species) %>%
separate(key, c("Measure","Stat"),"_") %>%
spread(Stat, value)
You'd probably want to reshape this further or reformat for output to a table, but it gives you an idea of what's possible.
Species Measure max mean min 1 setosa Petal.Length 1.9 1.462 1.0 2 setosa Petal.Width 0.6 0.246 0.1 3 setosa Sepal.Length 5.8 5.006 4.3 4 setosa Sepal.Width 4.4 3.428 2.3 5 versicolor Petal.Length 5.1 4.260 3.0 6 versicolor Petal.Width 1.8 1.326 1.0 7 versicolor Sepal.Length 7.0 5.936 4.9 8 versicolor Sepal.Width 3.4 2.770 2.0 9 virginica Petal.Length 6.9 5.552 4.5 10 virginica Petal.Width 2.5 2.026 1.4 11 virginica Sepal.Length 7.9 6.588 4.9 12 virginica Sepal.Width 3.8 2.974 2.2