Search code examples
rdataframedplyrmutate

Faster way to add a calculated column


I have a dataframe in which I want to check some condition and need to add a new column based on result of condition.

This is my input data

InputData = data.frame(A = c("", "", "Apple"), B = c("", "", "Orange"), C = c("", "", ""), D = c(0, 1, 1))

This is my desired output

OutputData = InputData %>%
  mutate(R = case_when(A=='' & B=='' & C=='' & D==0 ~ "Yes", TRUE ~ "No"))

I have tried mutate with Case function. It is working fine but it takes longer time when I have more number of rows.

Please help me to do it in faster way.


Solution

  • I'm surprised that your code is slow with such small data (only 100k rows). I would do it like this:

    InputData$R <- "No"
    InputData[InputData$A == '' & InputData$B == '' &
                InputData$C == '' & InputData$D == 0, "R"] <- "Yes"
    

    However, I strongly recommend using logical values instead of "Yes"/"No":

    InputData$S <- InputData$A == '' & InputData$B == '' &
      InputData$C == '' & InputData$D == 0
    #      A      B C D   R     S
    #1                0 Yes  TRUE
    #2                1  No FALSE
    #3 Apple Orange   1  No FALSE
    

    If that is still too slow, package data.table can help. But it shouldn't be necessary unless the data gets actually large.