Search code examples
rdplyrcasemutate

In R create a new column from a column using the function case_when() with multiple conditional rules


I am trying to add a column. I have a column “Y” with values (numeric) going from -50 to 350, I would like to create a new column “Z” evaluating the values creating variables with the conditions from -30 to 30 = “Transition”, 31 to 100 = “Early”, 101 to 200 = “Mid”, 201 to 300 = “Late” everything else “NA” I am trying using the case_when function, within mutate function from dplyr, see code below. But keep getting erorr message. Any help will be very much appreciated

DataSetNew <- DataSet %>%
dplyr::mutate(ColumnZ = case_when(
ColumnY == < = 30 ~ "Transition",
ColumnY == between(31,100) ~ "Early",
ColumnY == between(101,200) ~ "Mid",
ColumnY == between(201,305) ~ "Late",
TRUE ~ "NA"
))
Error: unexpected '<' in:
"  dplyr::mutate(ColumnZ = case_when(
ColumnY == <"

Solution

  • You need to declare the column of interest inside the between() function. In your question you state 201-300 == "Late", but in your code the upper threshold for "late" is 305. This example uses the former.

    Also, instead of TRUE ~ for all other values, the most recent advice is to use .default = instead.

    library(dplyr)
    
    # Sample data
    DataSet <- data.frame(id = 1:9,
                          ColumnY = c(-30, 30, 31, 100, 101, 200, 201, 300, 301))
    
    # Return ColumnZ
    DataSetNew <- DataSet |>
      mutate(ColumnZ = case_when(between(ColumnY, -Inf, 30) ~ "Transition",
                                 between(ColumnY, 31, 100) ~ "Early",
                                 between(ColumnY, 101, 200) ~ "Mid",
                                 between(ColumnY, 201, 300)  ~ "Late",
                                 .default = NA))
    
    DataSetNew
    #   id ColumnY    ColumnZ
    # 1  1     -30 Transition
    # 2  2      30 Transition
    # 3  3      31      Early
    # 4  4     100      Early
    # 5  5     101        Mid
    # 6  6     200        Mid
    # 7  7     201       Late
    # 8  8     300       Late
    # 9  9     301       <NA>
    

    This is the equivalent of:

    DataSetNew <- DataSet |>
      mutate(ColumnZ = case_when(ColumnY <= 30 ~ "Transition",
                                 ColumnY >= 31 & ColumnY <= 100 ~ "Early",
                                 ColumnY >= 101 & ColumnY <= 200 ~ "Mid",
                                 ColumnY >= 201 & ColumnY <= 300  ~ "Late",
                                 .default = NA))