Search code examples
rfunctiondataframesapply

how to create function with input from dataframe and apply it over all rows?


I try to write a function in R which takes several variables from a dataframe as input and gives a vector with results as output.

Based on this post below I did write the function below. How can create a function using variables in a dataframe

Although I receive this warning message:

the condition has length > 1 and only the first element will be used

I have tried to solve it by the post below using sapply in the function although I do not succeed. https://datascience.stackexchange.com/questions/33351/what-is-the-problem-with-the-condition-has-length-1-and-only-the-first-elemen

# a data frame with columns a, x, y and z:

myData <- data.frame(a=1:5,
                     x=(2:6),
                     y=(11:15),
                     z=3:7)


myFun3 <- function(df, col1 = "x", col2 = "y", col3 = "z"){      
   result <- 0      
   if(df[,col1] == 2){result <- result + 10
   }      
   if(df[,col2] == 11){result <- result + 100
   }      
   return(result)
}

myFun3(myData)

>    Warning messages:
>    1: In if (df[, col1] == 2) { :
>      the condition has length > 1 and only the first element will be used
>    2: In if (df[, col2] == 11) { :
>      the condition has length > 1 and only the first element will be used

Can someone explain me how I can apply the function over all rows of the dataframe? Thanks a lot!


Solution

  • We need ifelse instead of if/else as if/else is not vectorized

    myFun3 <- function(df, col1 = "x", col2 = "y", col3 = "z"){ 
           result <- numeric(nrow(df))
           ifelse(df[[col1]] == 2,  result + 10,
               ifelse(df[[col2]] == 11, result + 100, result))     
    
       }
    
    myFun3(myData)
    #[1] 10  0  0  0  0
    

    Or the OP's code can be Vectorized after making some changes i.e. remove the second if with an else if ladder

    myFun3 <- Vectorize(function(x, y){      
       result <- 0      
       if(x == 2) {
           result <- result + 10
        } else if(y == 11){
           result <- result + 100
         } else result <- 0     
       return(result)
    })
    myFun3(myData$x, myData$y)
    #[1] 10  0  0  0  0
    

    Regarding the OP's doubts about when multiple conditions are TRUE, then want only the first to be executed, the ifelse (nested - if more than two) or if/else if/else (else if ladder or if/else nested) both works because it is executed in that same order we specified the condition and it stops as soon as a TRUE condition occurred i.e. suppose we have multiple conditions

     if(expr1) {
        1
     } else if(expr2) {
        2
     } else if(expr3) {
       3
     } else if(expr4) {
       4
     } else {
       5}
    

    checks the first expression ('expr1') first, followed by second, and so on. The moment it return TRUE, it exit i.e. it is a nested condition

    if(expr1) {
         1
      } else {
            if(expr2) {
              2
             } else {
                 if(expr3) {
                   3
                  } else {
                     if(expr4) {
                      4
                      } else 5
                        }
                     }
               }
    

    There is a cost for this i.e.. whereever we have the more values that matches the 1, only the expr1 is executed and thus saves time, but if there are more 5 values, then all those conditions are checked