Search code examples
rif-statementboolean

Boolean based If condition in R


The code below fill the 3rd column value based on the 2nd if condition. Any idea why it does not work? any alternative?

code:

Num <- 1:8
Name <- c("Jim","Jim","James","Jim","Jim","James", "Sara", "Sara")
Loc <- 0
myDF <- data.frame(Num, Name,Loc)

  
for (x in 1:nrow(myDF)){ 
  if (grepl("Jim", myDF[x,2])) == TRUE {
    myDF[x,3]) = "CA"
  }
  if (grepl("James", myDF[x,2])) == TRUE {
    myDF[x,3]) = "TX"
  }  
  if (grepl("Sara", myDF[x,2])) == TRUE {
    myDF[x,3]) = "MN"
  }  
}
 

Solution

  • You have a few syntax errors in your posted code, but this works:

    for (x in 1:nrow(myDF)){ 
      if (grepl("Jim", myDF[x,2])) {
        myDF[x,3] = "CA"
    }
      if (grepl("James", myDF[x,2])) {
        myDF[x,3] = "TX"
      }  
      if (grepl("Sara", myDF[x,2])) {
        myDF[x,3] = "MN"
      }  
    }
    

    A solution using the dplyr package would be:

    library(dplyr)
    
    myDF |>
      mutate(Loc = case_match(Name,
                              "Jim" ~ "CA",
                              "James" ~ "TX",
                              "Sara" ~ "MN",
                              .default = "Unknown"))
    

    If you needed to use a regular expression then you should use case_when instead (pseudo code below):

    case_when(grepl("Jim", Name) ~ "CA",...)
    

    In base R

    myDF$Loc <- with(myDF, ifelse(grepl("Jim", Name), "CA", 
                                  ifelse(grepl("James", Name), "TX", 
                                         ifelse(grepl("Sara", Name), "MN", "Unknown")))
    )
    

    Or with a named vector (if you do not need a regular expression)

    lookup <- c("Jim" = "CA", "James" = "TX", "Sara" = "MN")
    
    myDF$Loc <- lookup[myDF$Name]