Search code examples
rconditional-formattingexpss

Conditional formatting in expss tables


My question can be considering an extension of the following discussion: R expss package: format numbers by statistic / apply different format to alternate rows

I would like to understand the grammar of conditions to be able to write my own custom formats. Consider the 'insert' dataframe from datasets. Then we create the following table thanks to expss:

infert %>%
  tab_cells(parity) %>%
  ### TOTAL
  tab_cols(total()) %>%
  tab_stat_cases(label="N", total_row_position="none") %>%
  ### OTHER VARIABLES
  tab_cols(education) %>%
  tab_stat_cases(label="N", total_row_position="none") %>% 
  tab_stat_cpct(label="%Col.", total_row_position="none") %>%
  tab_pivot(stat_position="inside_columns") %>%
  format_vert()

The last line operates basic formatting, as discussed in the URL above. In details:

format_vert = function(tbl, pct_digits=2, n_digits=0){
  #Finding columns to format
  pct_cols = grepl("\\|%Col.$", names(tbl), perl = TRUE)
  n_cols = grepl("\\|N$", names(tbl), perl = TRUE)
  #Format
  recode(tbl[,-1]) = other ~ function(x) ifelse(is.numeric(x) & is.na(x), 0, x)
  tbl[,pct_cols] = format(tbl[,pct_cols], digits=pct_digits, nsmall=pct_digits) 
  tbl[,n_cols] = format(tbl[,n_cols], digits=n_digits, nsmall=n_digits)
  recode(tbl[,pct_cols]) = other ~ function(x) paste0(x, "%")
  tbl
}

I understand how to format whole tables or columns (experts would have noticed the differences vs. the example in the URL), but what if I only wish to format specific cells? For instance, how to set digits=0 when value = 100,00% (to only show 100%) ? I don't know if I should go for recode, format, when and where to reference tbl[,pct_cols]...

Thank you!


Solution

  • The simplest way is to insert additional recodings into recode in the function format_vert. We can't use recoding in the form of '100.00' ~ '100' because columns are already aligned with spaces. So we use regular expressions. perl means perl-style regex comparison and \\b means word boundary. All values which will match with such expression will be recoded.

    data(infert)
    format_vert = function(tbl, pct_digits=2, n_digits=0){
        #Finding columns to format
        pct_cols = grepl("\\|%Col.$", names(tbl), perl = TRUE)
        n_cols = grepl("\\|N$", names(tbl), perl = TRUE)
        #Format
        recode(tbl[,-1]) = other ~ function(x) ifelse(is.numeric(x) & is.na(x), 0, x)
        tbl[,pct_cols] = format(tbl[,pct_cols], digits=pct_digits, nsmall=pct_digits) 
        tbl[,n_cols] = format(tbl[,n_cols], digits=n_digits, nsmall=n_digits)
        recode(tbl[,pct_cols]) = c(
            perl("\\b0.00\\b") ~ "0%   ",  # additional recodings
            perl("\\b100.00\\b") ~ "100%   ", # additional recodings
            other ~ function(x) paste0(x, "%")
        )
        tbl
    }
    
    infert %>%
        tab_cells(parity) %>%
        ### TOTAL
        tab_cols(total()) %>%
        tab_stat_cases(label="N", total_row_position="none") %>%
        ### OTHER VARIABLES
        tab_cols(education) %>%
        tab_stat_cases(label="N", total_row_position="none") %>% 
        tab_stat_cpct(label="%Col.", total_row_position="none") %>%
        tab_pivot(stat_position="inside_columns") %>%
        format_vert()