Search code examples
rr-markdowntidyverse

how to auto modify interval factor level for better display


Suppose you have data that looks something like this

df <- data.frame(income = rnorm(1000,77345,30569))

You add a column to indicate the quartile interval factor that each observation falls under

df$quant <- cut(df$income, quantile(df$income))

The factor levels look something like this

Levels: (-4.48e+04,5.6e+04] (5.6e+04,7.69e+04] (7.69e+04,9.73e+04] (9.73e+04,1.64e+05]

How can you programmatically, not manually, change the intervals so they print out nicely in a frequency summary table?

df %>% count(quant)

Which prints like this:

               quant   n
1 (-4.48e+04,5.6e+04] 249
2  (5.6e+04,7.69e+04] 250
3 (7.69e+04,9.73e+04] 250
4 (9.73e+04,1.64e+05] 250

I want it to look something like this

              quant   n
1  ($44,800,$56,000] 249
2  ($56,000,$76,900] 250
3  ($76,900,$97,300] 250
4 ($97,300,$164,000] 250

This is just for printing purposes (in a Rmarkdown report). I have made all calculations and plotting without a problem already.


Solution

  • cut2 can take a formatfun argument

    library(Hmisc)
    library(scales)
    df$quant2 <-  cut2(df$income,digits = 5, cuts = quantile(df$income), 
       formatfun = function(x) paste0("$", comma(x)), onlycuts = TRUE)
    

    -output

    > head(df)
         income             quant2               quant
    1  60657.97  [$55,485,$76,547) (5.55e+04,7.65e+04]
    2  93747.88  [$76,547,$96,620) (7.65e+04,9.66e+04]
    3  90172.46  [$76,547,$96,620) (7.65e+04,9.66e+04]
    4  59504.10  [$55,485,$76,547) (5.55e+04,7.65e+04]
    5 103251.01 [$96,620,$178,251] (9.66e+04,1.78e+05]
    6  85477.03  [$76,547,$96,620) (7.65e+04,9.66e+04]
    

    If we want to modify the original cut column

    library(tidyr)
    library(stringr)
    df <- df %>%
         mutate(quant = str_remove_all(quant, "\\(|\\]")) %>% 
         separate(quant, into = c('q1', 'q2'), sep=",", convert = TRUE) %>% 
         mutate(across(q1:q2, ~ dollar(.x)), 
         quant = glue::glue("({q1},{q2}]"), q1 = NULL, q2 = NULL)
    

    -output

    > head(df)
         income              quant
    1  60657.97  ($55,500,$76,500]
    2  93747.88  ($76,500,$96,600]
    3  90172.46  ($76,500,$96,600]
    4  59504.10  ($55,500,$76,500]
    5 103251.01 ($96,600,$178,000]
    6  85477.03  ($76,500,$96,600]