Search code examples
rcrosstab

How to transform aggregated counts into a contingency table?


I would like to use a dataframe of census data of several countries and make a contingency table with row/column/total percentages out of it. The thing I am struggling with is that the data is already aggregated in the form of a contingency table. How do I convert the raw counts into a contingency table?

set.seed(1)
country <- c(rep("countryA", 6), rep("countryB", 6))
age <- c(rep(c("0-14", "15-24", "25-59"), 2), rep(c("0-18", "18-30", "30-60"), 2))
sex <- rep(c(rep("female", 3), rep("male", 3)), 2)
count <- abs(round(rnorm(12, 1000000, 500000)))
df <- data.frame(country, age, sex, count)

Note that in the data I have different census bureaus report census data for different age brackets as reflected in the MWE.

What I would like is this:

[[1]] CountryA
                  0-14                15-24                25-59
Female            row/col/total%      row/col/total%       row/col/total%      Row margins %
Male              row/col/total%      row/col/total%       row/col/total%      Row margins %
                  Column margins %    Column margins %     Column margins %    Total margins %

[[2]] CountryB
                  0-18                18-30                30-60
Female            row/col/total%      row/col/total%       row/col/total%      Row margins %
Male              row/col/total%      row/col/total%       row/col/total%      Row margins %
                  Column margins %    Column margins %     Column margins %    Total margins %

Where row/col/total% should be options to choose from and do not necessarily need to be displayed in the same table. I am also flexible as to the class of the final output. I imagine it to be a list but it wouldn't have to be.

I am aware of prop.table(table(), margins =1/2), janitor::tabyl(), gmodels::CrossTable(), stats::xtabs, Deducer::contingency.tables as well as http://pcwww.liv.ac.uk/~william/R/crosstab.r. The problem that I have with each of these packages is that they take individual observations as inputs and I cannot get them to work with aggregated counts.

Bonus: In addition to the percentages, it would be nice to reformat the dataframe into a table of counts which would look like the one I have shown but for the counts of the original dataframe.


Solution

  • An option would be to split the dataset by 'country', create a summary table with xtabs and apply the prop.table

    lapply(split(df[-1], df$country), 
            function(x) prop.table(xtabs(count ~ sex + age, droplevels(x))))
    

    It is also possible that the OP wanted

    lapply(split(df[-1], df$country), function(x) {
       x1 <- xtabs(count ~ sex + age, droplevels(x))
       x2 <- addmargins(x1)
       x2[-nrow(x2),-ncol(x2)] <- x2[-nrow(x2),-ncol(x2)]/x2[nrow(x2),
                          -ncol(x2)]/x2[length(x2)]
       x2})