I would like to use a dataframe of census data of several countries and make a contingency table with row/column/total percentages out of it. The thing I am struggling with is that the data is already aggregated in the form of a contingency table. How do I convert the raw counts into a contingency table?
set.seed(1)
country <- c(rep("countryA", 6), rep("countryB", 6))
age <- c(rep(c("0-14", "15-24", "25-59"), 2), rep(c("0-18", "18-30", "30-60"), 2))
sex <- rep(c(rep("female", 3), rep("male", 3)), 2)
count <- abs(round(rnorm(12, 1000000, 500000)))
df <- data.frame(country, age, sex, count)
Note that in the data I have different census bureaus report census data for different age brackets as reflected in the MWE.
What I would like is this:
[[1]] CountryA
0-14 15-24 25-59
Female row/col/total% row/col/total% row/col/total% Row margins %
Male row/col/total% row/col/total% row/col/total% Row margins %
Column margins % Column margins % Column margins % Total margins %
[[2]] CountryB
0-18 18-30 30-60
Female row/col/total% row/col/total% row/col/total% Row margins %
Male row/col/total% row/col/total% row/col/total% Row margins %
Column margins % Column margins % Column margins % Total margins %
Where row/col/total% should be options to choose from and do not necessarily need to be displayed in the same table. I am also flexible as to the class of the final output. I imagine it to be a list but it wouldn't have to be.
I am aware of prop.table(table(), margins =1/2), janitor::tabyl(), gmodels::CrossTable(), stats::xtabs, Deducer::contingency.tables as well as http://pcwww.liv.ac.uk/~william/R/crosstab.r. The problem that I have with each of these packages is that they take individual observations as inputs and I cannot get them to work with aggregated counts.
Bonus: In addition to the percentages, it would be nice to reformat the dataframe into a table of counts which would look like the one I have shown but for the counts of the original dataframe.
An option would be to split
the dataset by 'country', create a summary table with xtabs
and apply the prop.table
lapply(split(df[-1], df$country),
function(x) prop.table(xtabs(count ~ sex + age, droplevels(x))))
It is also possible that the OP wanted
lapply(split(df[-1], df$country), function(x) {
x1 <- xtabs(count ~ sex + age, droplevels(x))
x2 <- addmargins(x1)
x2[-nrow(x2),-ncol(x2)] <- x2[-nrow(x2),-ncol(x2)]/x2[nrow(x2),
-ncol(x2)]/x2[length(x2)]
x2})