Search code examples
rdataframedplyrcutseq

How to extend the range of stratification beyond the maximum value in a data frame?


Suppose we start with the following data frame, with the generating code beneath it:

> stratData
  ID  Period Values
1  1 2020-03     -5
2  1 2020-04     25
3  2 2020-01     35
4  2 2020-02     45
5  2 2020-03     55
6  2 2020-04     87
7  3 2020-02     10
8  3 2020-03     20
9  3 2020-04     30

stratData <- 
   data.frame(
     ID = c(1,1,2,2,2,2,3,3,3),
     Period = c("2020-03", "2020-04", "2020-01", "2020-02", "2020-03", "2020-04", "2020-02", "2020-03", "2020-04"),
     Values = c(-5, 25, 35, 45, 55, 87, 10, 20, 30)
     )

I have a Shiny app that allows the user to stratify certain values in the data frame by different criteria, including the size of stratification bands. At the bottom is the stratification MWE code. The problem I'm having is the stratification range not properly labeling the maximum value in the range. As shown below, the maximum range shows NA, instead of the correct (85,90] to encompass the maximum from the stratData data frame. How can this be solved?

# A tibble: 7 x 5
  Range   Count Values Count_pct Values_pct
  <fct>   <dbl>  <dbl>     <dbl>      <dbl>
1 [25,35]     2     55      66.7       38.7
2 (35,45]     0      0       0          0  
3 (45,55]     0      0       0          0  
4 (55,65]     0      0       0          0  
5 (65,75]     0      0       0          0  
6 (75,85]     0      0       0          0  
7 NA          1     87      33.3       61.3

Here is the MWE code to generate the above table (there are vestiges from the more complete App; I thought about making the max = round(value, -1) or something like that but a difficulty is the user has the ability to input different ranges in order to customize the stratification):

custom_min <- function(x) {if (length(x)>0) min(x, na.rm=TRUE) else Inf}
custom_max <- function(x) {if (length(x)>0) max(x, na.rm=TRUE) else Inf}

filter_exp1 <- parse(text=paste0("Period",  "==", "'","2020-04", "'"))
stratData_1 <- stratData %>% filter(eval(filter_exp1))

min <- custom_min(stratData_1[[3]])
max <- custom_max(stratData_1[[3]])
breaks <- if(any(is.infinite(c(min,max)))) c(0, 10) else seq(min, max, by = 10) # < in full code, the 10 is a variable the user can change via Shiny

tmp <- stratData %>% 
  filter(eval(filter_exp1)) %>%
  mutate(Range = cut(!!sym("Values"), breaks=breaks, include.lowest=TRUE, right = TRUE, dig.lab = 5)) %>% 
  group_by(Range) %>% 
  summarise(Count = n(),Values = sum(!!sym("Values"))) %>% 
  complete(Range, fill = list(Count = 0,Values = 0)) %>% 
  ungroup %>% 
  mutate(Count_pct = Count/sum(Count)*100, Values_pct = Values/sum(Values)*100) %>% 
  dplyr::select(everything(), Count, Count_pct, Values, Values_pct)
tmp

Solution

  • You need to use e.g. seq(min, max, length.out = 5). The by argument must not be used here, because max=87 is not a multiple of 10:

    min <- 25
    max <- 87
    # does not include the max
    seq(min, max, by = 10)
    #> [1] 25 35 45 55 65 75 85
    # does include the max
    seq(min, max, length.out = 5)
    #> [1] 25.0 40.5 56.0 71.5 87.0
    

    Created on 2022-02-07 by the reprex package (v2.0.1)