Search code examples
rexpss

How to create count and percentage tables and linegraphs with 1 independent variable and 3 dependent ones


I'm an R neophyte, and somehow this problem seems like it should be trivial to solve. But unfortunately, I haven't been able to do so after about three days of searching and experimenting.

My data is in a form close to wideform:

color   agegroup    sex     ses
red     2           Female  A
blue    2           Female  C
green   5           Male    D
red     3           Female  A
red     2           Male    B
blue    1           Female  B
...

I'm trying to create presentable tables with counts and percentages of the dependent variable (color here) organized by sex, ses and agegroup. I need one table organized by ses and sex for each agegroup, with counts next to the percentages, like this:

agegroup:                                  1
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

agegroup:                               2
sex:                  Female                               Male
ses:        A       B       C       D           A       B       C       D
color:
red         2 1%    0  0%   8 4%    22 11%      16 8%   2   1%  8   4%  3 1.5%
blue        9 4.5%  6  3%   4 2%    2  1%       12 6%   32 16%  14  7%  6   3%
green       4 2%    12 6%   2 1%    8  4%       0  0%   22 11%  40 20%  0   0%

I've been trying to do this with everything from datatables and expss to gmodels, but I just can't figure out how to get output like this. CrossTables from gmodels comes closest, but it's still pretty far away -- (1) it puts percentages under counts, (2) I can't get it to nest sel under sex, (3) I can't figure out how to get it to disgregate the results by generation, and (4) the output is full of dashes, vertical pipes and spaces which make putting it into a word processor or spreadsheet an error-prone manual affair.

EDIT: I removed my second question (about line plots), because the answer to the first question is perfect and deserves credit, even if it doesn't touch on the second one. I'll ask the second question separately, as I should have from the start.


Solution

  • The closest result with expss package:

    library(expss)
    # generate example data
    set.seed(123)
    N = 300
    df = data.frame(
        color = sample(c("red", "blue", "green"), size = N, replace = TRUE),
        agegroup = sample(1:5, size = N, replace = TRUE),
        sex = sample(c("Male", "Female"), size = N, replace = TRUE),
        ses = sample(c("A", "B", "C", "D"),  size = N, replace = TRUE),
        stringsAsFactors = FALSE
    )
    
    # redirect output to RStudio HTML viewer
    expss_output_viewer()
    res = df %>% 
        tab_cells("|" = color) %>% # dependent variable, "|" used to suppress label
        tab_cols(sex %nest% ses) %>% # column variable
        tab_rows(agegroup) %>% 
        tab_total_row_position("none") %>% # we don't need total
        tab_stat_cases(label = "Cases") %>% # calculate cases
        tab_stat_cpct(label = "%") %>% # calculate percent
        tab_pivot(stat_position = "inside_columns") %>% # finalize table
        make_subheadings(number_of_columns = 2)
    
    # difficult part - add percent sign
    for(i in grep("%", colnames(res))){
        res[[i]] = ifelse(trimws(res[[i]])!="", 
                          paste0(round(res[[i]], 1), "%"),
                          res[[i]] 
                          )
    }
    
    # additionlly remove stat labels
    colnames(res) = gsub("\\|Cases|%", "", colnames(res), perl = TRUE)
    
    res
    

    In the RStudio Viewer result will be in the HTML format (see image). Unfortunately, I can't test how it will be pasted to the MS Word. enter image description here Disclaimer: I am an author of expss package.