Search code examples
rdplyrr-factor

Create new variable based on a condition across multiple columns


I have a binary variable ("Penalty") and 30 factors with the same levels: "Discharge", "Suspended", "Fine", "Community order", and "Imprisonment".

A small example:

ID Possession Importation Production Penalty
1 Fine NA Fine Yes
2 NA NA Community order No
3 Discharge Discharge NA No
4 NA NA Suspended Yes
5 Imprisonment NA NA No
6 Fine NA Imprisonment No

I would like to create a new factor based on the same condition across these columns plus the binary variable and where there are differing levels in the same row would like the new variable 'sentence' to retain the levels with this priority: Imprisonment > Community order, Suspended > Fine > Discharge. e.g. Discharge will only be present in the new column where no other level appears.

Desired output:

ID Possession Importation Production Penalty Sentence
1 Fine NA Fine Yes Fine
2 NA NA Community order No Community order
3 Discharge Discharge NA No Discharge
4 NA NA Suspended Yes Suspended
5 Imprisonment NA NA No Imprisonment
6 Fine NA Imprisonment No Imprisonment

This is what I have attempted: (where "vec" is a vector of the factor column indices)

data <- data %>%
  mutate(
    crim_sanct = case_when(
      (if_any(vec) == "Discharge") ~ "Discharge",
      (if_any(vec) == "Fine") | (data$Penalty == "Yes") ~ "Fine",
      (if_any(vec) ==  "Suspended") ~ "Suspended",
      (if_any(vec) ==  "Community order") ~ "Community order",
      (if_any(vec) ==  "Imprisonment") ~ "imprisonment"))

Solution

  • You are in the right direction but have some small syntax issues in if_any.

    Also in case_when you need to put the conditions based on the priority. So if Imprisonment > Community order then Imprisonment condition should come first before Community order.

    library(dplyr)
    
    data <- data %>%
      mutate(
        crim_sanct = 
          case_when(
          if_any(Possession:Production, ~. ==  "Imprisonment") ~ "imprisonment",
          if_any(Possession:Production, ~ . == "Discharge") ~ "Discharge",
          if_any(Possession:Production,  ~. ==  "Suspended") ~ "Suspended",
          if_any(Possession:Production, ~. == "Fine") | (Penalty == "Yes") ~ "Fine",
          if_any(Possession:Production, ~. ==  "Community order") ~ "Community order")
    )
    data
    
    #  ID   Possession Importation      Production Penalty      crim_sanct
    #1  1         Fine        <NA>            Fine     Yes            Fine
    #2  2         <NA>        <NA> Community order      No Community order
    #3  3    Discharge   Discharge            <NA>      No       Discharge
    #4  4         <NA>        <NA>       Suspended     Yes       Suspended
    #5  5 Imprisonment        <NA>            <NA>      No    imprisonment
    #6  6         Fine        <NA>    Imprisonment      No    imprisonment