Search code examples
rcut

cut function for to label out-of-tolerance parameter values


For a process parameter, there are upper and low limits. When the data is collected and stored in a vector, I try to use the cut function to recode the vector.

There is how I did it (as an example):

x = mtcars$mpg

cut(x, breaks = c(-Inf,20, 30, Inf), labels = c("low","good","high"))

This works beautifully.

But when I tried to label too high and too low values as just a "failure", there is a error message:

x = mtcars$mpg

cut(x, breaks = c(-Inf,20, 30, Inf), labels = c("failure","pass","failure"))

Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : factor level [3] is duplicated

Obviously cut function does not expect us to provide repetitive labels.

Is there any workaround for this ?


Solution

  • If you want to continue using cut one option is to change the levels after cut

    x1 <- cut(x, breaks = c(-Inf,20, 30, Inf), labels = c("low","good","high"))
    levels(x1) <- c("failure","pass","failure")
    

    However, instead of cut you can use simple ifelse

    ifelse(x >= 20 & x <= 30, "pass", "failure")
    

    Or just

    c("failure", "pass")[(x >= 20 & x <= 30) + 1]
    

    Or if there are multiple conidtions to check we can use case_when from dplyr where we can add conditions if required.

    library(dplyr)
    mtcars %>%
      mutate(result = case_when(mpg >= 20 & mpg <= 30 ~ "pass", 
                                TRUE ~ "failure"))