Search code examples
rregexfunctionloopscase-when

How to use case_when with loop and regex?


I want to generate a new variable named new_p which takes the values of other variables based on paste0. My approach below produces the new variable, but does not assign the values correctly. It only assigns one value and returns NA for the other values of lev. Thanks for any help.

My data:

tempDF <- structure(list(d1 = c("A", "B", "C"), d2 = c(40L, 50L, 20L), 
    d3 = c(20L, 40L, 50L), d4 = c(60L, 30L, 30L), p_A = c(1L, 
    3L, 2L), p_B = c(3L, 4L, 3L), p_C = c(2L, 1L, 1L), p4 = c(5L, 
    5L, 4L)), class = "data.frame", row.names = c(NA, -3L))

lev<-levels(as.factor(tempDF$d1))

View(tempDF) 

My approach:

for(i in seq_along(lev)){

func<-function(tempDF, i, lev){

newDT<-tempDF%>%
mutate(.,  
        new_p = case_when (
         d1  ==  paste0(lev[i]) ~ .[, paste0("p", "_", lev[i])]
        ))%>%
        as.data.frame(.)
        }

newDT<-func(tempDF, i, lev) %>%
        as.data.frame(.)

}

View(newDT)

newDT
  d1 d2 d3 d4 p_A p_B p_C p4 new_p
   A 40 20 60   1   3   2  5 NA
   B 50 40 30   3   4   1  5 NA
   C 20 50 30   2   3   1  4 1

Expected output:

newDT
  d1 d2 d3 d4 p_A p_B p_C p4 new_p
   A 40 20 60   1   3   2  5 1
   B 50 40 30   3   4   1  5 4
   C 20 50 30   2   3   1  4 1

Edit: Barradas's function applied to a larger data:

tempDF <- structure(list(d1 = c("A", "B", "C", "A", "C"), d2 = c(40L, 50L, 20L, 50L, 20L), 
    d3 = c(20L, 40L, 50L, 40L, 50L), d4 = c(60L, 30L, 30L,60L, 30L), p_A = c(1L, 
    3L, 2L, 3L, 2L), p_B = c(3L, 4L, 3L, 3L, 4L), p_C = c(2L, 1L, 1L,2L, 1L), p4 = c(5L, 
    5L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, -5L))

View(tempDF)    

lev<-levels(as.factor(tempDF$d1))

func <- function(tempDF, lev){
  i <- match(tempDF$d1, lev)
  j <- match(paste0("p", "_", lev), names(tempDF))
  tempDF$new_p <- tempDF[cbind(i, j)]
  tempDF
}

newDT <- func(tempDF, lev)

Warning message:
In cbind(i, j) :
  number of rows of result is not a multiple of vector length (arg 2)

View(newDT)

newDT
  d1 d2 d3 d4 p_A p_B p_C p4 new_p
   A 40 20 60   1   3   2  5     1
   B 50 40 30   3   4   1  5     4
   C 20 50 30   2   3   1  4     1
   A 50 40 60   3   3   2  5     1  //wrong, new_p should be 3, not 1
   C 20 50 30   2   4   1  4     3  //wrong, new_p should be 1, not 3

Solution

  • You don't need loops nor pipes to do what the question asks for, match and plain dataframe extraction can solve the problem.

    func <- function(tempDF, lev){
      i <- match(tempDF$d1, lev)
      j <- match(paste0("p", "_", lev), names(tempDF))
      tempDF$new_p <- tempDF[cbind(i, j)]
      tempDF
    }
    
    newDT <- func(tempDF, lev)
    
    newDT
    #  d1 d2 d3 d4 p_A p_B p_C p4 new_p
    #1  A 40 20 60   1   3   2  5     1
    #2  B 50 40 30   3   4   1  5     4
    #3  C 20 50 30   2   3   1  4     1
    

    Edit.

    The following function, returns the correct output, with both the original data and with the bigger one.

    func <- function(DF, levs){
      i <- sapply(levs, function(l) which(DF$d1 == l))
      j <- rep(match(paste0("p", "_", levs), names(DF)), lengths(i))
      i <- unlist(i)
      o <- cbind(unlist(i),j)
      o <- o[order(o[,1]),]
      DF$new_p <- DF[o]
      DF
    }