Search code examples
rfor-loopcase

How to avoid this `for`loop to operate with a list


I have a problem with R, because I need to do some operations on a list and create a new dataframe with those values from the list. If I use a for loop it takes very long time to do. And don't know how to avoid that for loop, and how to do that "if + case_when" without that for loop.

In the code below there are comments to explain what I do and what happends.

Thanks a lot!!

#search in all rows of list "total"
for(i in 1:nrow(total)) {

  #Take with total$Cad[[i]] a value from another list
  val1 <- posdi[posdi$cad == str_to_upper(total$Cad[[i]]),]

  #Check if "font" value from val1 is equal to "Taake" and take the value
  val2 <- val1[val1$font == "Taake",]

  #Format date value
  thedate <- as.numeric(format(as.Date(total$TheDate[[i]], format="%Y-%m-%d"), '%Y%m%d'))

  #And here comes where I can't continue easily. I want to do an IF and make a different 
  #case_when if the result is between 1 and 5 or between 6 and 7
  if(total$dia[[i]] >= 1 & total$dia[[i]] <= 5) {
    fran = case_when(
      total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 1,
      total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 2,
      total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 3,
      total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 4)
  } else {
    fran = case_when(
      total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 5,
      total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 6,
      total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 7,
      total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 8)
  }
  
  #and finally, add that "fran" value, those three from the beggining and some from total list to a new dataframe
  datosTel[nrow(datosTel) + 1,] = c(val2$cad, str_to_upper(total$Camp[[i]]), total$numsem[[i]], thedate, total$diasem[[i]], fran, 0)
}
#It works with the "for" loop, but it take so much time (it goes one by one and the list has more than 200K rows).
#How can I do it without that for loop and make the "if + case_when" correctly?

Thanks again and have a good day

As said before, my problems are the FOR loop and that IF and CASE_WHEN inside the FOR, because I don't know how to do it if there is no loop


Solution

  • The code inside your loop only touches the current element ([[i]]), and all operations you are performing are vectorised by default (except for the if, but we can replace that directly by if_else).

    So you can replace the entire loop with a mutate or transmute statement (they do the same, transmute just does not keep existing columns, so it seems more appropriate in your case).

    Furthermore, you can simplify the if by merging the two branches and adding an offset that depends on total$dia.

    Lastly, your case_when expression happens to be expressible as a findInterval expression.

    In the following I am assuming that datosTel is an empty table before your loop, and I am also making some assumptions about the column names which you may need to adjust.

    datosTel = total %>%
      transmute(
        cad = posdi$cad[posdi$cad == str_to_upper(Cad) & posdi$font == "Taake"],
        Camp = str_to_upper(Camp),
        numsem = numsem,
        thedate = as.numeric(format(as.Date(TheDate, format="%Y-%m-%d"), '%Y%m%d')),
        diasem = diasem,
        offset = if_else(dia >= 1 & dia <= 5, 0, 4),
        fran = offset + findInterval(secs, c(0, 1.5, 4, 8, 10, Inf)),
        LAST_COLUMN = 0
      ) %>%
      select(-offset)
    

    (Replace LAST_COLUMN with the actual column name.)

    The findInterval call is equvialent to:

        case_when(
          secs >= 0 & secs < 1.5 ~ 1,
          secs >= 1.5 & secs < 4 ~ 2,
          secs >= 4 & secs < 8 ~ 3,
          secs >= 8 & secs < 10 ~ 4
        )